NSString simple pattern matching - regex

Mac OS 10.6, Cocoa project, 10.4 compatibility required.
(Please note: my knowledge of regex is quite slight)
I need to parse NSStrings, for matching cases where the string contains an embedded tag, where the tag format is:
[xxxx]
Where xxxx are random characters.
e.g. "The quick brown [foxy] fox likes sox".
In the above case, I need to grab the string "foxy". (Or nil if no tag is found.)
Each string will only have one tag, and the tag can appear anywhere within the string, or may not appear at all.
Could someone please help with a way to do that, preferably without having to include another library such as RegexKit. Thank you for any help.

I'd suggest something like the following:
NSString *subString = nil;
NSRange range1 = [myString rangeOfString:#"["];
NSRange range2 = [myString rangeOfString:#"]"];
if ((range1.length == 1) && (range2.length == 1) && (range2.location > range1.location)) {
NSRange range3;
range3.location = range1.location+1;
range3.length = (range2.location - range1.location)-1;
subString = [myString substringWithRange:range3];
}

Related

Refactoring starting place for regex

I have a function that stripes HTML markup to display inside of a text element.
stripChar: function stripChar(string) {
string = string.replace(/<\/?[^>]+(>|$)/g, "")
string = string.trim()
string = string.replace(/(\n{2,})/gm,"\n\n");
string = string.replace(/…/g,"...")
string = string.replace(/ /g,"")
let changeencode = entities.decode(string);
return changeencode;
}
This has worked great for me, but I have a new requirement and Im struggle to work out where I should start refactoring the code above. I still need to stripe out the above, but I have 2 exceptions;
List items, <ul><li>, I need to handle these so that they still appear as a bullet point
Hyperlinks, I want to use the react-native-hyperlink, so I need to leave intack the <a> for me to handle separately
Whilst the function is great for generalise tag replacement, its less flexible for my needs above.
You may use
stripChar: function stripChar(string) {
string = string.replace(/ |<(?!\/?(?:li|ul|a)\b)\/?[^>]+(?:>|$)/g, "");
string = string.trim();
string = string.replace(/\n{2,}/g,"\n\n");
string = string.replace(/…/g,"...")
let changeencode = entities.decode(string);
return changeencode;
}
The main changes:
.replace(/ /g,"") is moved to the first replace
The first replace is now used with a new regex pattern where the li, ul and a tags are excluded from the matches using a negative lookahead (?!\/?(?:li|ul|a)\b).
See the updated regex demo here.

Using Regex to parse ASCII protocol

I'm working on a simple application that interacts with a device via an Telnet session with a ASCII based protocol.
There will be a lot of interaction with the device so i'm looking for a fast way to parse the incoming string. Now the manufacturer was so kind to release there Regex scheme. But since Regex is very new to me i don't understand how to retrieve the value. I know how to match but when i match i want to get the value from it.
Regex scheme
NameAndValue := [A-Z_]+:("(\\.|[^"\\])*"|(\\.|[^\s"\\])*)
Value := ("(\\.|[^"\\])*"|(\\.|[^\s"\\])*)
ValueUnquoted := (\\.|[^\s"\\])*
ValueQuoted := "(\\.|[^"\\])*"
CharQuoted := (\\.|[^"\\])
CharUnquoted := (\\.|[^\s"\\])
EscapedChar := \\.
CharCommon := [^\s"\\]
CharEscape := \\
CharQuote := "
CharSpace := \s
Example of a response
CMD1:"string value" CMD2:1 CMD3:"string value again" <LF> or <CR>+<LF>
I've read a lot of documentation and tried lot's of approaches, however someone could point me out in the right direct.
I did however wrote a simple parser that finds the index positions of commands and there values and then uses a substring to retrieve only the value. It works, but i prefer an "nicer" way with the power of Regex.
--------- EDIT 18-10-2017 ---------
Request of #VBobCat to provide a more detailed "parsing" requirement.
So let's say i have a object with the properties Foo and Bar and we have a second object with the properties cat and dog
Now when i receive the string via telnet i have to parse it to one of those objects. Lucky the string always begins with what it holds. So lets say x for object with Foo and Bar and animal for object with cat and dog.
Now with the provided Regex i want to parse the values in the string to the properties of the object. Something like:
X CMD1_Foo:1 CMD2_Bar:"string value" <LF> or <CR>+<LF>
Object X.Foo = CMD1_Foo.value
Object X.Bar = CMD2_Bar.value
OR
Animal CMD1_Cat:"Miauw" CMD2_Dog:"woef" <LF> or <CR>+<LF>
Object X.Cat = CMD1_Cat.value
Object X.Dog = CMD2_Dog.value
If all your samples are consistent with your example, this could work:
Function ParseTelnet(input As String) As DataTable
Dim retTable As New DataTable
retTable.Columns.Add("command", GetType(String))
retTable.Columns.Add("value", GetType(String))
Dim entries = System.Text.RegularExpressions.Regex.Split(input, "\s+(?=\w+:)")
Dim pairs = entries.Select(
Function(entry) If(entry, "").Trim(Chr(9), Chr(10), Chr(13), Chr(32)).Split({":"c}, 2)).Where(
Function(pair) pair.Count = 2)
For Each pair In pairs
If pair(1).StartsWith("""") AndAlso pair(1).EndsWith("""") Then
retTable.Rows.Add(pair(0), pair(1).Substring(1, pair(1).Length - 2))
Else
retTable.Rows.Add(pair(0), pair(1))
End If
Next
Return retTable
End Function

Matching PoS tags with specific text with `testacy.extract.pos_regex_matches(...)`

I'm using textacy's pos_regex_matches method to find certain chunks of text in sentences.
For instance, assuming I have the text: Huey, Dewey, and Louie are triplet cartoon characters., I'd like to detect that Huey, Dewey, and Louie is an enumeration.
To do so, I use the following code (on testacy 0.3.4, the version available at the time of writing):
import textacy
sentence = 'Huey, Dewey, and Louie are triplet cartoon characters.'
pattern = r'<PROPN>+ (<PUNCT|CCONJ> <PUNCT|CCONJ>? <PROPN>+)*'
doc = textacy.Doc(sentence, lang='en')
lists = textacy.extract.pos_regex_matches(doc, pattern)
for list in lists:
print(list.text)
which prints:
Huey, Dewey, and Louie
However, if I have something like the following:
sentence = 'Donald Duck - Disney'
then the - (dash) is recognised as <PUNCT> and the whole sentence is recognised as a list -- which it isn't.
Is there a way to specify that only , and ; are valid <PUNCT> for lists?
I've looked for some reference about this regex language for matching PoS tags with no luck, can anybody help? Thanks in advance!
PS: I tried to replace <PUNCT|CCONJ> with <[;,]|CCONJ>, <;,|CCONJ>, <[;,]|CCONJ>, <PUNCT[;,]|CCONJ>, <;|,|CCONJ> and <';'|','|CCONJ> as suggested in the comments, but it didn't work...
Is short, it is not possible: see this official page.
However the merge request contains the code of the modified version described in the page, therefore one can recreate the functionality, despite it's less performing than using a SpaCy's Matcher (see code and example -- though I have no idea how to reimplement my problem using a Matcher).
If you want to go down this lane anyway, you have to change the line:
words.extend(map(lambda x: re.sub(r'\W', '', x), keyword_map[w]))
with the following:
words.extend(keyword_map[w])
otherwise every symbol (like , and ; in my case) will be stripped off.

Regex in swift. A template for a specific numeric format

I am new in swift, I have been working with it only few weeks and now I am trying to parse something like a price list from incoming string. It has the next format:
2.99 X 3.00 = 10 A
Some text here
1.22 X 1.5 10 A
And the hardest part is that sometime A or some digit is missing but X should be in the place.
I would like to find out how it is possible to use regex in swift (or something like that if it does not exist) to write a template for parsing the next value
d.dd X d.d SomeValueIfExists
I would very appreciate any useful information, topics to read or any other resources to get more knowledge about swift.
PS. I have access to the dev. forums but I've never used them before.
I did an example recentl, and maybe a little harder than necessary, to demonstrate RegEx use in Swift:
let str1: NSString = "I run 12 miles"
let str2 = "I run 12 miles"
let match = str1.rangeOfString("\\d+", options: .RegularExpressionSearch)
let finalStr = str1.substringWithRange(match).toInt()
let n: Double = 2.2*Double(finalStr!)
let newStr = str2.stringByReplacingOccurrencesOfString("\\d+", withString: "\(n)", options: NSStringCompareOptions.RegularExpressionSearch, range: nil)
println(newStr) //I run 26.4 miles
Two of these have "RegularExpressionSearch". If you put this in a playground you can see what each line does. Note the double \ escapes. One for the normal RegEx use and anther because \ is a special character in Swift.
Also a good article:
http://benscheirman.com/2014/06/regex-in-swift/

String extraction

Currently I am working very basic game using the C++ environment. The game used to be a school project but now that I am done with that programming class, I wanted to expand my skills and put some more flourish on this old assignment.
I have already made a lot of changes that I am pleased with. I have centralized all the data into folder hierarchies and I have gotten the code to read those locations.
However my problem stems from a very fundamental flaw that has been stumping me.
In order to access the image data that I am using I have used the code:
string imageLocation = "..\\DATA\\Images\\";
string bowImage = imageLocation + "bow.png";
The problem is that when the player picks up an item on the gameboard my code is supposed to use the code:
hud.addLine("You picked up a " + (*itt)->name() + "!");
to print to the command line, "You picked up a Bow!". But instead it shows "You picked up a ..\DATA\Images\!".
Before I centralized my data I used to use:
name_(item_name.substr(0, item_name.find('.')))
in my Item class constructor to chop the item name to just something like bow or candle. After I changed how my data was structured I realized that I would have to change how I chop the name down to the same simple 'bow' or 'candle'.
I have changed the above code to reflect my changes in data structure to be:
name_(item_name.substr(item_name.find("..\\DATA\\Images\\"), item_name.find(".png")))
but unfortunately as I alluded to earlier this change of code is not working as well as I planned it to be.
So now that I have given that real long winded introduction to what my problem is, here is my question.
How do you extract the middle of a string between two sections that you do not want? Also that middle part that is your target is of an unknown length.
Thank you so very much for any help you guys can give. If you need anymore information please ask; I will be more than happy to upload part or even my entire code for more help. Again thank you very much.
In all honeasty, you're probably approaching this from the wrong end.
Your item class should have a string "bow", in a private member. The function Item::GetFilePath would then (at runtime) do "..\DATA\Images\" + this->name + ".png".
The fundamental property of the "bow" item object isn't the filename bow.png, but the fact that it's a "bow". The filename is just a derived proerty.
Assuming I understand you correctly, the short version of your question is: how do I split a string containing a file path so I have removed the path and the extension, leaving just the "title"?
You need the find_last_of method. This gets rid of the path:
std::size_type lastSlash = filePath.find_last_of('\\');
if (lastSlash == std::string::npos)
fileName = filePath;
else
fileName = filePath.substr(lastSlash + 1);
Note that you might want to define a constant as \\ in case you need to change it for other platforms. Not all OS file systems use \\ to separate path segments.
Also note that you also need to use find_last_of for the extension dot as well, because filenames in general can contain dots, throughout their paths. Only the very last one indicates the start of the extension:
std::size_type lastDot = fileName.find_last_of('.');
if (lastDot == std::string::npos)
{
title = fileName;
}
else
{
title = fileName.substr(0, lastDot);
extension = fileName.substr(lastDot + 1);
}
See http://msdn.microsoft.com/en-us/library/3y5atza0(VS.80).aspx
using boost filesystem:
#include "boost/filesystem.hpp"
namespace fs = boost::filesystem;
void some_function(void)
{
string imageLocation = "..\\DATA\\Images\\";
string bowImage = imageLocation + "bow.png";
fs::path image_path( bowImage );
hud.addLine("You picked up a " + image_path.filename() + "!"); //prints: You picked up a bow!
So combining Paul's and my thoughts, try something like this (broken down for readability):
string extn = item_name.substr(item_name.find_last_of(".png"));
string path = item_name.substr(0, item_name.find("..\\DATA\\Images\\"));
name_ = item_name.substr( path.size(), item_name.size() - extn.size() );
You could simplify it a bit if you know that item name always starts with "..DATA" etc (you could store it in a constant and not need to search for it in the string)
Edit: Changed extension finding part to use find_last_of, as suggested by EarWicker, (this avoids the case where your path includes '.png' somewhere before the extension)
item_name.find("..\DATA\Images\") will return the index at which the substring "..\DATA\Images\" starts but it seems like you'd want the index where it ends, so you should add the length of "..\DATA\Images\" to the index returned by find.
Also, as hamishmcn pointed out, the second argument to substr should be the number of chars to return, which would be the index where ".png" starts minus the index where "..\DATA\Images\" ends, I think.
One thing that looks wrong is that the second parameter to substr should be the number of chars to copy, not the position.