How to get all substrings from captured groups in regex in Vala? - regex

I am writing an aplication in Vala that uses regex. What I need to do is wrap all hashtags in string in tags. And I don't quite understand how regex works in Vala.
Currently I am trying to do something like this:
Regex hashtagRegex = new Regex("(#[\\p{L}0-9_]+)[ #]");
MatchInfo info;
if (hashtagRegex.match_all_full(string, -1, 0, 0, out info))
{
foreach(string hashTag in info.fetch_all())
string = string.replace(hashTag, "" + hashTag + "");
}
but it parses only first hashtag and with the space on the end of it.
I am using [ #] at the end of the regex because some users don't separate hashtags with spaces and just write bunch of hashtags like this: #hashtag1#hashtag2#hashtag3 and I want to handle it too.
What I need to do is to somehow get an array of all hashtags in string to use it to wrap all of them in tags. How can I do it?

What I need to do is to somehow get an array of all hashtags in string to use it to wrap all of them in tags. How can I do it?
No it isn't.
Try something like this:
private static int main (string[] args) {
try {
GLib.Regex hashtagRegex = new GLib.Regex ("#([a-zA-Z0-9_\\-]+)");
string res = hashtagRegex.replace_eval ("foo #bar baz #qux qoo", -1, 0, 0, (mi, s) => {
s.append_printf ("%s", mi.fetch (1), mi.fetch (0));
return false;
});
GLib.message (res);
} catch (GLib.Error e) {
GLib.error (e.message);
}
return 0;
}
I don't know what characters are valid in a hash tag, but you can always tweak the regex as needed. The important part is using a callback to perform the replacement.

Related

JavaFX - TextField with regex for zipcode

for my programm I want to use a TextField where the user can enter a zipcode (German ones). For that I tried what you can see below. If the user enters more than 5 digits every additional digit shall be deleted immediately. Of course letters are not allowed.
When I use this pattern ^[0-9]{0,5}$ on https://regex101.com/ it does what I intended to, but when I try this in JavaFX it doesn't work. But I couldn't find a solution yet.
Can anyone tell me what I did wrong?
Edit: For people, who didn't work with JavaFX yet: When the user enters just one character, the method check(String text) is called. So the result should also be true, when there are 1 to 5 digits. But not more ;-)
public class NumberTextField extends TextField{
ErrorLabel label;
NumberTextField(String text, ErrorLabel label){
setText(text);
setFont(Font.font("Calibri", 17));
setMinHeight(35);
setMinWidth(200);
setMaxWidth(200);
this.label = label;
}
NumberTextField(){}
#Override
public void replaceText(int start, int end, String text){
if(check(text)) {
super.replaceText(start, end, text);
}
}
#Override
public void replaceSelection(String text){
if(check(text)){
super.replaceSelection(text);
}
}
private boolean check(String text){
if(text.matches("^[0-9]{0,5}$")){
label.setText("Success");
label.setBlack();
return true;
} else{
return false;
}
}
You don't need to extend TextField to do this. In fact I recommend using a TextFormatter, since this is simpler to implement:
It does not require you to overwrite multiple method. You simply need to decide based on the data about the desired input, if you want to allow the change or not.
final Pattern pattern = Pattern.compile("\\d{0,5}");
TextFormatter<?> formatter = new TextFormatter<>(change -> {
if (pattern.matcher(change.getControlNewText()).matches()) {
// todo: remove error message/markup
return change; // allow this change to happen
} else {
// todo: add error message/markup
return null; // prevent change
}
});
TextField textField = new TextField();
textField.setTextFormatter(formatter);
Your original expression should be working fine, if we wish to validate a five-digits zip though, we might want to drop the 0 quantifier:
^[0-9]{5}$
^\d{5}$
For validation purposes, we might want to keep the start and end anchors, however for just testing, we can remove and see:
[0-9]{5}
\d{5}
It is likely that some other chars, would get through our inputs, which we do not wish to have.
Demo
Test
import java.util.regex.Matcher;
import java.util.regex.Pattern;
final String regex = "^[0-9]{5}$";
final String string = "01234\n"
+ "012345\n"
+ "0\n"
+ "1234";
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println("Full match: " + matcher.group(0));
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println("Group " + i + ": " + matcher.group(i));
}
}

phrase search in meteor search-source package

I have a meteor app for which I added the search-source package to search certain collections and it works partially. That is, when I search for the term foo bar it returns results for each of "foo" and "bar". This is fine, but I want to also be able to wrap the terms in quotes this way: "foo bar" and get results for an exact match only. at the moment when i do this i get an empty set. Here is my server code:
//Server.js
SearchSource.defineSource('FruitBasket', function(searchText, options) {
// options = options || {}; // to be sure that options is at least an empty object
if(searchText) {
var regExp = buildRegExp(searchText);
var selector = {$or: [
{'fruit.name': regExp},
{'fruit.season': regExp},
{'fruit.treeType': regExp}
]};
return Basket.find(selector, options).fetch();
} else {
return Basket.find({}, options).fetch();
}
});
function buildRegExp(searchText) {
// this is a dumb implementation
var parts = searchText.trim().split(/[ \-\:]+/);
return new RegExp("(" + parts.join('|') + ")", "ig");
}
and my client code:
//Client.js
Template.dispResults.helpers({
getPackages_fruit: function() {
return PackageSearch_fruit.getData({
transform: function(matchText, regExp) {
return matchText.replace(regExp, "<b>$&</b>")
},
sort: {isoScore: -1}
});
}
});
Thanks in advance!
I've modified the .split pattern so that it ignores everything between double quotes.
/[ \-\:]+(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)/
Thus, you can simply wrap an exact phrase search in double quotes and it won't get split.
There is one more thing; since we don't need the quotes, they are removed in the next line using a .map function with a regex that replaces double quotes at the start or the end of a string part: /^"|"$/
Sample code:
function buildRegExp(searchText) {
// exact phrase search in double quotes won't get split
var arr = searchText.trim().split(/[ \-\:]+(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)/);
var parts = arr.map(function(x){return x.replace(/^"|"$/g, '');});
return new RegExp("(" + parts.join('|') + ")", "ig");
}
console.log(buildRegExp("foo bar"));
console.log(buildRegExp("\"foo bar\""));

String replace with dictionary exception handling

I've implemented the answer here to do token replacements of a string:
https://stackoverflow.com/a/1231815/1224021
My issue now is when this method finds a token with a value that is not in the dictionary. I get the exception "The given key was not present in the dictionary." and just return the normal string. What I'd like to happen obviously is all the good tokens get replaced, but the offending one remains au naturale. Guessing I'll need to do a loop vs. the one line regex replace? Using vb.net. Here's what I'm currently doing:
Shared ReadOnly re As New Regex("\$(\w+)\$", RegexOptions.Compiled)
Public Shared Function GetTokenContent(ByVal val As String) As String
Dim retval As String = val
Try
If Not String.IsNullOrEmpty(val) AndAlso val.Contains("$") Then
Dim args = GetRatesDictionary()
retval = re.Replace(val, Function(match) args(match.Groups(1).Value))
End If
Catch ex As Exception
' not sure how to handle?
End Try
Return retval
End Function
The exception is likely thrown in the line
retval = re.Replace(val, Function(match) args(match.Groups(1).Value))
because this is the only place you are keying the dictionary. Make use of the Dictionary.ContainsKey method before accessing it.
retval = re.Replace(val,
Function(match)
return If(args.ContainsKey(match.Groups(1).Value), args(match.Groups(1).Value), val)
End Function)
This is what I got to work vs. the regex, which was also a suggestion on the original thread by Allen Wang: https://stackoverflow.com/a/7957728/1224021
Public Shared Function GetTokenContent(ByVal val As String) As String
Dim retval As String = val
Try
If Not String.IsNullOrEmpty(val) AndAlso val.Contains("$") Then
Dim args = GetRatesDictionary("$")
retval = args.Aggregate(val, Function(current, value) current.Replace(value.Key, value.Value))
End If
Catch ex As Exception
End Try
Return retval
End Function
I know it's been a while since this question was answered, but FYI for anyone wanting to still use the Regex / Dictionary match approach, the following works (based on the sample in the OP question):
retVal = re.Replace(formatString,
match => args.ContainsKey(match.Groups[1].Captures[0].Value)
? args[match.Groups[1].Captures[0].Value]
: string.Empty);
... or my full sample as a string extension method is:
public static class StringExtensions
{
// Will replace parameters enclosed in double curly braces
private static readonly Lazy<Regex> ParameterReplaceRegex = new Lazy<Regex>(() => new Regex(#"\{\{(?<key>\w+)\}\}", RegexOptions.Compiled));
public static string InsertParametersIntoFormatString(this string formatString, string parametersJsonArray)
{
if (parametersJsonArray != null)
{
var deserialisedParamsDictionary = JsonConvert.DeserializeObject<Dictionary<string, string>>(parametersJsonArray);
formatString = ParameterReplaceRegex.Value.Replace(formatString,
match => deserialisedParamsDictionary.ContainsKey(match.Groups[1].Captures[0].Value)
? deserialisedParamsDictionary[match.Groups[1].Captures[0].Value]
: string.Empty);
}
return formatString;
}
}
There are a few things to note here:
1) My parameters are passed in as a JSON array, e.g.: {"ProjectCode":"12345","AnotherParam":"Hi there!"}
2) The actual template / format string to do the replacements on has the parameters enclosed in double curly braces: "This is the Project Code: {{ProjectCode}}, this is another param {{AnotherParam}}"
3) Regex is both Lazy initialized and Compiled to suit my particular use case of:
the screen this code serves may not be used often
but once it is, it will get heavy use
so it should be as efficient on subsequent calls as possible.

Json regex deserialization with JSON.Net

I'm trying to read the following example json from a text file into a string using the JSON.Net parsing library.
Content of C:\temp\regeLib.json
{
"Regular Expressions Library":
{
"SampleRegex":"^(?<FIELD1>\d+)_(?<FIELD2>\d+)_(?<FIELD3>[\w\&-]+)_(?<FIELD4>\w+).txt$"
}
}
Example code to try and parse:
Newtonsoft.Json.Converters.RegexConverter rConv = new Newtonsoft.Json.Converters.RegexConverter();
using (StreamReader reader = File.OpenText(libPath))
{
string foo = reader.ReadToEnd();
JObject jo = JObject.Parse(foo);//<--ERROR
//How to use RegexConverter to parse??
Newtonsoft.Json.JsonTextReader jtr = new Newtonsoft.Json.JsonTextReader(reader);
JObject test = rConv.ReadJson(jtr);//<--Not sure what parameters to provide
string sampleRegex = test.ToString();
}
It seems I need to use the converter, I know the code above is wrong, but I can't find any examples that describe how / if this can be done. Is it possible to read a regular expression token from a text file to a string using JSON.Net? Any help is appreciated.
UPDATE:
Played with it more and figured out I had to escape the character classes, once I made the correction below I was able to parse to a JObject and use LINQ to query for the regex pattern.
Corrected content C:\temp\regeLib.json
{
"Regular Expressions Library":
{
"SampleRegex":"^(?<FIELD1>\\d+)_(?<FIELD2>\\d+)_(?<FIELD3>[\\w\\&-]+)_(?<FIELD4>\\w+).txt$"
}
}
Corrected code
using (StreamReader reader = File.OpenText(libPath))
{
string content = reader.ReadToEnd().Trim();
JObject regexLib = JObject.Parse(content);
string sampleRegex = regexLib["Regular Expressions Library"]["SampleRegex"].ToString();
//Which then lets me do the following...
Regex rSampleRegex = new Regex(sampleRegex);
foreach (string sampleFilePath in Directory.GetFiles(dirSampleFiles, "*"))
{
filename = Path.GetFileName(sampleFilePath);
if (rSampleRegex.IsMatch(filename))
{
//Do stuff...
}
}
}
Not sure if this is the best approach, but it seems to work for my case.
i don't understand why you have to store such a small regex in a json file, are you going to expand the regex in the future?
if so, rather than doing this
JObject regexLib = JObject.Parse(content);
string sampleRegex = regexLib["Regular Expressions Library"]["SampleRegex"].ToString();
Consider using json2csharp to make classes, at least it's strongly-typed and make it more maintainable.
I think a more appropriate json would look like this (assumptions):
{
"Regular Expressions Library": [
{
"SampleRegex": "^(?<FIELD1>\\d+)_(?<FIELD2>\\d+)_(?<FIELD3>[\\w\\&-]+)_(?<FIELD4>\\w+).txt$"
},
{
"SampleRegex2": "^(?<FIELD1>\\d+)_(?<FIELD2>\\d+)_(?<FIELD3>[\\w\\&-]+)_(?<FIELD4>\\w+).txt$"
}
]
}
It would make more sense this way to store a regex in a "settings" file

How do I check if a filename matches a wildcard pattern

I've got a wildcard pattern, perhaps "*.txt" or "POS??.dat".
I also have list of filenames in memory that I need to compare to that pattern.
How would I do that, keeping in mind I need exactly the same semantics that IO.DirectoryInfo.GetFiles(pattern) uses.
EDIT: Blindly translating this into a regex will NOT work.
I have a complete answer in code for you that's 95% like FindFiles(string).
The 5% that isn't there is the short names/long names behavior in the second note on the MSDN documentation for this function.
If you would still like to get that behavior, you'll have to complete a computation of the short name of each string you have in the input array, and then add the long name to the collection of matches if either the long or short name matches the pattern.
Here is the code:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
namespace FindFilesRegEx
{
class Program
{
static void Main(string[] args)
{
string[] names = { "hello.t", "HelLo.tx", "HeLLo.txt", "HeLLo.txtsjfhs", "HeLLo.tx.sdj", "hAlLo20984.txt" };
string[] matches;
matches = FindFilesEmulator("hello.tx", names);
matches = FindFilesEmulator("H*o*.???", names);
matches = FindFilesEmulator("hello.txt", names);
matches = FindFilesEmulator("lskfjd30", names);
}
public string[] FindFilesEmulator(string pattern, string[] names)
{
List<string> matches = new List<string>();
Regex regex = FindFilesPatternToRegex.Convert(pattern);
foreach (string s in names)
{
if (regex.IsMatch(s))
{
matches.Add(s);
}
}
return matches.ToArray();
}
internal static class FindFilesPatternToRegex
{
private static Regex HasQuestionMarkRegEx = new Regex(#"\?", RegexOptions.Compiled);
private static Regex IllegalCharactersRegex = new Regex("[" + #"\/:<>|" + "\"]", RegexOptions.Compiled);
private static Regex CatchExtentionRegex = new Regex(#"^\s*.+\.([^\.]+)\s*$", RegexOptions.Compiled);
private static string NonDotCharacters = #"[^.]*";
public static Regex Convert(string pattern)
{
if (pattern == null)
{
throw new ArgumentNullException();
}
pattern = pattern.Trim();
if (pattern.Length == 0)
{
throw new ArgumentException("Pattern is empty.");
}
if(IllegalCharactersRegex.IsMatch(pattern))
{
throw new ArgumentException("Pattern contains illegal characters.");
}
bool hasExtension = CatchExtentionRegex.IsMatch(pattern);
bool matchExact = false;
if (HasQuestionMarkRegEx.IsMatch(pattern))
{
matchExact = true;
}
else if(hasExtension)
{
matchExact = CatchExtentionRegex.Match(pattern).Groups[1].Length != 3;
}
string regexString = Regex.Escape(pattern);
regexString = "^" + Regex.Replace(regexString, #"\\\*", ".*");
regexString = Regex.Replace(regexString, #"\\\?", ".");
if(!matchExact && hasExtension)
{
regexString += NonDotCharacters;
}
regexString += "$";
Regex regex = new Regex(regexString, RegexOptions.Compiled | RegexOptions.IgnoreCase);
return regex;
}
}
}
}
You can simply do this. You do not need regular expressions.
using Microsoft.VisualBasic.CompilerServices;
if (Operators.LikeString("pos123.txt", "pos?23.*", CompareMethod.Text))
{
Console.WriteLine("Filename matches pattern");
}
Or, in VB.Net,
If "pos123.txt" Like "pos?23.*" Then
Console.WriteLine("Filename matches pattern")
End If
In c# you could simulate this with an extension method. It wouldn't be exactly like VB Like, but it would be like...very cool.
You could translate the wildcards into a regular expression:
*.txt -> ^.+\.txt$
POS??.dat _> ^POS..\.dat$
Use the Regex.Escape method to escape the characters that are not wildcars into literal strings for the pattern (e.g. converting ".txt" to "\.txt").
The wildcard * translates into .+, and ? translates into .
Put ^ at the beginning of the pattern to match the beginning of the string, and $ at the end to match the end of the string.
Now you can use the Regex.IsMatch method to check if a file name matches the pattern.
Just call the Windows API function PathMatchSpecExW().
[Flags]
public enum MatchPatternFlags : uint
{
Normal = 0x00000000, // PMSF_NORMAL
Multiple = 0x00000001, // PMSF_MULTIPLE
DontStripSpaces = 0x00010000 // PMSF_DONT_STRIP_SPACES
}
class FileName
{
[DllImport("Shlwapi.dll", SetLastError = false)]
static extern int PathMatchSpecExW([MarshalAs(UnmanagedType.LPWStr)] string file,
[MarshalAs(UnmanagedType.LPWStr)] string spec,
MatchPatternFlags flags);
/*******************************************************************************
* Function: MatchPattern
*
* Description: Matches a file name against one or more file name patterns.
*
* Arguments: file - File name to check
* spec - Name pattern(s) to search foe
* flags - Flags to modify search condition (MatchPatternFlags)
*
* Return value: Returns true if name matches the pattern.
*******************************************************************************/
public static bool MatchPattern(string file, string spec, MatchPatternFlags flags)
{
if (String.IsNullOrEmpty(file))
return false;
if (String.IsNullOrEmpty(spec))
return true;
int result = PathMatchSpecExW(file, spec, flags);
return (result == 0);
}
}
Some kind of regex/glob is the way to go, but there are some subtleties; your question indicates you want identical semantics to IO.DirectoryInfo.GetFiles. That could be a challenge, because of the special cases involving 8.3 vs. long file names and the like. The whole story is on MSDN.
If you don't need an exact behavioral match, there are a couple of good SO questions:
glob pattern matching in .NET
How to implement glob in C#
For anyone who comes across this question now that it is years later, I found over at the MSDN social boards that the GetFiles() method will accept * and ? wildcard characters in the searchPattern parameter. (At least in .Net 3.5, 4.0, and 4.5)
Directory.GetFiles(string path, string searchPattern)
http://msdn.microsoft.com/en-us/library/wz42302f.aspx
Plz try the below code.
static void Main(string[] args)
{
string _wildCardPattern = "*.txt";
List<string> _fileNames = new List<string>();
_fileNames.Add("text_file.txt");
_fileNames.Add("csv_file.csv");
Console.WriteLine("\nFilenames that matches [{0}] pattern are : ", _wildCardPattern);
foreach (string _fileName in _fileNames)
{
CustomWildCardPattern _patetrn = new CustomWildCardPattern(_wildCardPattern);
if (_patetrn.IsMatch(_fileName))
{
Console.WriteLine("{0}", _fileName);
}
}
}
public class CustomWildCardPattern : Regex
{
public CustomWildCardPattern(string wildCardPattern)
: base(WildcardPatternToRegex(wildCardPattern))
{
}
public CustomWildCardPattern(string wildcardPattern, RegexOptions regexOptions)
: base(WildcardPatternToRegex(wildcardPattern), regexOptions)
{
}
private static string WildcardPatternToRegex(string wildcardPattern)
{
string patternWithWildcards = "^" + Regex.Escape(wildcardPattern).Replace("\\*", ".*");
patternWithWildcards = patternWithWildcards.Replace("\\?", ".") + "$";
return patternWithWildcards;
}
}
For searching against a specific pattern, it might be worth using File Globbing which allows you to use search patterns like you would in a .gitignore file.
See here: https://learn.microsoft.com/en-us/dotnet/core/extensions/file-globbing
This allows you to add both inclusions & exclusions to your search.
Please see below the example code snippet from the Microsoft Source above:
Matcher matcher = new Matcher();
matcher.AddIncludePatterns(new[] { "*.txt" });
IEnumerable<string> matchingFiles = matcher.GetResultsInFullPath(filepath);
The use of RegexOptions.IgnoreCase will fix it.
public class WildcardPattern : Regex {
public WildcardPattern(string wildCardPattern)
: base(ConvertPatternToRegex(wildCardPattern), RegexOptions.IgnoreCase) {
}
public WildcardPattern(string wildcardPattern, RegexOptions regexOptions)
: base(ConvertPatternToRegex(wildcardPattern), regexOptions) {
}
private static string ConvertPatternToRegex(string wildcardPattern) {
string patternWithWildcards = Regex.Escape(wildcardPattern).Replace("\\*", ".*");
patternWithWildcards = string.Concat("^", patternWithWildcards.Replace("\\?", "."), "$");
return patternWithWildcards;
}
}