How to detect text inside specific character on Dart - regex

I am trying to create a function to detect if some text is encapsuled by a special symbol/character , so if i have String :
String text = 'This is normal String, <b> This is Bold String <b>';
i would get a List of Map like this :
[ {'This is Normal String' : 'normal'} ,
{'This is Bold String' : 'bold'} ]
Then i can rewrite it on RichText,
What i have tried is splitting the text like this: List<String> list = text.split('<b>');
and make the even index of the list bold, but it will not behave the way i wanted if the bold tag is on the front of the text , and if i need to detect another character like <i> , is there any way to do this ?
Thankyou

var string = 'I am here';
string.contains('h');// true
//You can use Regex to find patterns inside a string
string.contains(new RegExp(r'[A-Z]')); // true

Related

How can I do a regex replace using a List as the possible match entires?

I have a list of terms which I want to match as follows:
final List _emotions = [
'~~wink~~',
'~~bigsmile~~',
'~~sigh~~',
];
And a second list of replacements:
final List _replacements = [
'0.gif',
'1.gif',
'2.gif',
];
SO that if I have text:
var text = "I went to the store and got a ~~bigsmile~~";
I could have it replace the text as
I went to the store and got a <img src="1.gif" />
So essentially, I was thinking of running a regex replace on my text variable, but the search pattern would be based on my _emotions List.
Forming the replacement text should be easy, but I'm not sure how I could use the list as the basis for the search terms
How is this possible in dart?
You need to merge the two string lists into a single Map<String, String> that will serve as a dictionary (make sure the _emotions strings are in lower case since you want a case insensitive matching), and then join the _emotions strings into a single alternation based pattern.
After getting a match, use String#replaceAllMapped to find the right replacement for the found emotion.
Note you can shorten the pattern if you factor in the ~~ delimiters (see code snippet below). You might also apply more advanced techniques for the vocabulary, like regex tries (see my YT video on this topic).
final List<String> _emotions = [
'wink',
'bigsmile',
'sigh',
];
final List<String> _replacements = [
'0.gif',
'1.gif',
'2.gif',
];
Map<String, String> map = Map.fromIterables(_emotions, _replacements);
String text = "I went to the store and got a ~~bigsmile~~";
RegExp regex = RegExp("~~(${_emotions.join('|')})~~", caseSensitive: false);
print(text.replaceAllMapped(regex, (m) => '<img src="${map[m[1]?.toLowerCase()]}" />'));
Output:
I went to the store and got a <img src="1.gif" />

compare list items against another list

So lets say I have 3 item list:
myString = "prop zebra cool"
items = myString.split(" ")
#items = ["prop", "zebra", "cool"]
And another list content containing hudreds of string items. Its actally a list of files.
Now I want to get only the items of content that contain all of the items
So I started this way:
assets = []
for c in content:
for item in items:
if item in c:
assets.append(c)
And then somehow isolate only the items that are duplicated in assets list
And this would work fine. But I dont like that, its not elegant. And Im sure that there is some other way to deal with that in python
If I interpret your question correctly, you can use all.
In your case, assuming:
content = [
"z:/prop/zebra/rig/cool_v001.ma",
"sjasdjaskkk",
"thisIsNoGood",
"shakalaka",
"z:/prop/zebra/rig/cool_v999.ma"
]
string = "prop zebra cool"
You can do the following:
assets = []
matchlist = string.split(' ')
for c in content:
if all(s in c for s in matchlist):
assets.append(c)
print assets
Alternative Method
If you want to have more control (ie. you want to make sure that you only match strings where your words appear in the specified order), then you could go with regular expressions:
import re
# convert content to a single, tab-separated, string
contentstring = '\t'.join(content)
# generate a regex string to match
matchlist = [r'(?:{0})[^\t]+'.format(s) for s in string.split(' ')]
matchstring = r'([^\t]+{0})'.format(''.join(matchlist))
assets = re.findall(matchstring, contentstring)
print assets
Assuming \t does not appear in the strings of content, you can use it as a separator and join the list into a single string (obviously, you can pick any other separator that better suits you).
Then you can build your regex so that it matches any substring containing your words and any other character, except \t.
In this case, matchstring results in:
([^\t]+(?:prop)[^\t]+(?:zebra)[^\t]+(?:cool)[^\t]+)
where:
(?:word) means that word is matched but not returned
[^\t]+ means that all characters but \t will match
the outer () will return whole strings matching your rule (in this case z:/prop/zebra/rig/cool_v001.ma and z:/prop/zebra/rig/cool_v999.ma)

Splitting a string based on positions with regex

I need to convert this (date) String "12112014" to "12.11.2014"
What i would like to to is:
Split first 2 Strings "12", add ".",
then split the string from 3-4 to get "11", add "."
at the end split the last 4 strings (or 5-8) to get "2012"
I already found out how to get the first 2 characters ( "^\d{2}" ), but I failed to get characters based on a position.
Whatever be the programming language, You should try to extract the digits from string and then join them with a ".".
In perl, it can be done as :
$_ = '12112014';
s/(\d{2})(\d{2})(\d{4})/$1.$2.$3/;
print "$_";
Without you specifying the language you're after, I've picked javascript:
var s = '12012011';
var s2 = s.replace(/(\d{2})(\d{2})(\d{4})/,'$1.$2.$3'));
console.log(s2); // prints "12.01.2011"
The gist of it is that you use () to specify groups inside your regular expression and then can use the groups in your replace expression.
Same in Java:
String s = "12012011";
String s2 = s.replaceAll("(\\d{2})(\\d{2})(\\d{4})", "$1.$2.$3");
System.out.println(s2);
I dont think that you could do that only with split.
You could expand your expression to:
"(^(\d{2})(\d{2})(\d{4}))"
Then access the groups with the Regex language of your choice and build the string you want.
Note that - besides all regex learning - alternatively you could always parse the original string into strongly typed Date or DateTime variables and output the value using the appropriate locales.

Parsing as string of data but leaving out quotes

I need to use RegEx to run through a string of text but only return that parts that I need. Let's say for example the string is as follows:
1234,Weapon Types,100,Handgun,"This is the text, "and", that is all."""
\d*,Weapon Types,(\d*),(\w+), gets me most of the way, however it is the last part that I am having an issue with. Is there a way for me to capture the rest of the string i.e.
"This is the text, "and", that is all."""
without picking up the quotes? I've tried negating them, however it just stops the string at the quote.
Please keep in mind that the text for this string is unknown so doing literal matches will not work.
You've given us something very difficult to solve. It's okay that you have nested commas inside your string. Once we come across a double-quote, we can ignore everything until the end quote. This would gooble up commas.
But how will your parser know that the next double-quote isn't ending the string. How does it know that it a nested double-quote?
If I could slightly modify your input string to make it clear what is a nested quote, then parsing is easy...
var txt = "1234,Weapon Types,100,Handgun,\"This is the text, "and", that is all.\",other stuff";
var m = Regex.Match(txt, #"^\d*,Weapon Types,(\d*),(\w+),""([^""]+)""");
MessageBox.Show(m.Groups[3].Value);
But if your input string must have nested quotes like that, then we must come up with some other rule for detecting what is the real end of the string. How about this?
var txt = "1234,Weapon Types,100,Handgun,\"This is the text, \"and\", that is all.\",other stuff";
var m = Regex.Match(txt, #"^\d*,Weapon Types,(\d*),(\w+),""(.+)"",");
MessageBox.Show(m.Groups[3].Value);
The result is...
This is the text, "and", that is all.

Problem with Actionscript Regular Expressions

I have to parse out color information from HTML data. The colors can either be RGB colors or file names to a swatch image.
I used http://www.gskinner.com/RegExr/ to develop and test the patterns. I copied the AS regular expression code verbatim from the tool into Flex Builder. But, when I exec the pattern against the string I get a null.
Here are the patterns and an example of the string (I took the correct HTML tags out so the strings would show correctly):
DIV data:
<div style="background-color:rgb(2,2,2);width:10px;height:10px;">
DIV pattern:
/([0-9]{1,3},[0-9]{1,3},[0-9]{1,3})/
IMG data:
<img src="/media/swatches/jerzeesbirch.gif" width="10" height="10" alt="Birch">
IMG pattern:
/[a-z0-9_-]+/[a-z0-9_-]+/[a-z0-9_-]+\.[a-z0-9_-]+/
Here's my Actionscript code:
var divPattern : RegExp = new RegExp("/([0-9]{1,3},[0-9]{1,3},[0-9]{1,3})/");
var imgPattern : RegExp = new RegExp("/[a-z0-9_-]+/[a-z0-9_-]+/[a-z0-9_-]+\.[a-z0-9_-]+/");
var divResult : Array = divPattern.exec(object.swatch);
var imgResult : Array = imgPattern.exec(object.swatch);
Both of the arrays are null.
This is my first foray into AS coding, so I think I'm declaring something wrong.
Steve
(I don't know ActionScript but I know Javascript and they should be close enough to solve your problem.)
To construct a RegExp object for e.g. the pattern ^[a-z]+$, you either use
var pattern : RegExp = new RegExp("^[a-z]+$");
or, better,
var pattern : RegExp = /^[a-z]+$/
The code new RegExp("/^[a-z]+$/") is wrong because this expects a slash before the ^ and after the $.
Therefore, your DIV pattern should be written as
var divPattern : RegExp = /([0-9]{1,3},[0-9]{1,3},[0-9]{1,3})/;
but, as you know, the ( and ) are special characters for capturing, you need to escape them:
var divPattern : RegExp = /\([0-9]{1,3},[0-9]{1,3},[0-9]{1,3}\)/;
For the IMG pattern, as / delimitates a RegEx, you need to escape it as well:
var imgPattern : RegExp = /[a-z0-9_-]+\/[a-z0-9_-]+\/[a-z0-9_-]+\.[a-z0-9_-]+/
Finally, you could use \d in place of [0-9] and \w in place of [a-zA-Z0-9_].
I don't know enough to tell if your regex patterns are correct, but from the docs on the AS3 RegExp class, it looks like your new RegExp() call needs a second argument to declare flags for case sensitivity etc.
EDIT: Also, as Bart K has pointed out, you don't need the / delimiters when using the new method.
So you can use either:
var divPattern:RegExp = new RegExp("([0-9]{1,3},[0-9]{1,3},[0-9]{1,3})", "");
OR you can also use the alternate syntax with /:
var divPattern:RegExp = /([0-9]{1,3},[0-9]{1,3},[0-9]{1,3})/;
... in which case the flag string (if any) is included after the final /