Regex Python Problems with MS - regex

i am a newbe to regex and cant find the solution. I have searched like 3 hours for a solution...
I have the text
HELLO MS. I HOPE YOU HAVE NO PROBLEMS.
And i want to get the Result:
HELLO MISTRESS I HOPE YOU HAVE NO PROBLEMS.
But my code replace also the "MS." from Problems.
re.sub(r'(MS)+[.]', 'MISTRESS', text)
Thanks for your help.
Using Python 3.5.

Well an immediate fix here would be to place a lookbehind before MS. to assert that whitespace precedes:
text = "HELLO MS. I HOPE YOU HAVE NO PROBLEMS."
output = re.sub(r'(?<!\S)(MS)+[.]', 'MISTRESS', text)
print(output)
However, for a more general solution, we might need to better understand the grammar behind which contexts should be replaced and which should not.

Another way without regex using simple replace(),
dictionary = {"MR.":"MISTER", "MS.":"MISTRESS" }
main_string = "HELLO MS. I HOPE YOU HAVE NO PROBLEMS WITH MR. X."
for key in dictionary.keys():
main_string = main_string.replace(key, dictionary[key])
print(main_string)

Related

How would I duplicate a line and change a piece of text inside of the duplicate line?

I have recently ran into a problem that I am searching automation for using RegEx using Notepad++. I have some very limited experience with RegEx in N++, however I cannot figure out how to do the following:
I have the following line:
["Cost"] = 100,
And I want to achieve the following:
["Cost"] = 0,
["CostNew"] = 100,
Since I have many lines of "Cost" as portrayed above with varying values (I'm just using 100 as an example) I would need an automation for this process.
I am aware that you can create a new line by using "\n", but that is as far as my knowledge extends.
Is there a way of doing this with a RegEx expression? Or is it perhaps done better through multiple RegEx expressions?
Thank you in advance for reading my question!
Try this with unticked Match-Case
Find what: (\["cos)([^,])
Replace with: $1t"] = 0,\n$1$2new

fuzzy matching japanese strings in python?

this problem has me stumped for the whole day.
I have two Japanese strings that I want to fuzzy match in Python2.7. Currently I'm using fuzzywuzzy and
jpnStr = "日本語".encode('utf-8')
jpnList = ["日本語1".encode('utf-8'),"日本語2".encode('utf-8'),"日本語3".encode('utf-8')]
bestmatch = process.extractOne(jpnStr, jpnList)
but the resulting bestmatch is always
("日本語1",0)
How would I go by resolving this issue, or is there a best practice that I'm totally missing here? Sorry if I sound frustrated, it's been a roadblock for a while. Thanks in advance.
Ok, I'm not sure how helpful this is but I've found a workaround.
I found that I could fuzzymatch japanese strings using fuzzywuzzy.
First, you get the Unicoded Japanese string, ie "日本語です"
Then you output it as ascii text into a text file. Output will look something like "/uf34/ufeac/uewa3/..." so on and so forth.
Then you read the text file and compare the ascii representation of the japanese string : "/uf34/ufeac/uewa3/" against each other. This gives a workable fuzzywuzzy match rating.
It's probably not an ideal method, but it works and is fairly accurate. Hope this helps somebody.

boost regex match non-whitespace and angle brackets

I may be asking a duplicate question, but I've spent a couple of hours googling this to no avail!
I'm trying to extract a string from some SIP URLs parsed by a program I'm working on. Here's an excerpt of the code. I'm passing in sipUrl, and have all the right includes etc:
static const boost::regex sipRegExp ("(sip:\\S+?#(?=\\S)[^>]+);");
boost::cmatch result;
boost::match_results<string::const_iterator> results;
boost::match_flag_type flags = boost::format_perl;
string newSipUrl;
cout << sipUrl << endl;
bool toggle = boost::regex_search(sipUrl, result, sipRegExp, flags);
if (toggle) {
cout << result[1].str() << endl;
newSipUrl = result[1].str();
}
cout << "new url: " << newSipUrl << endl;
I'm basically trying to extract the sip:user#IP from strings like "\"alex#192.168.1.2\"<sip:alex#192.168.1.2>;tag=fe310852" or "\"bob\"<sip:bob#foo.com>;", however, I can't get it to match! It worked fine when I wasn't using lookahead to try and remove the last angle bracket, but ever since then it fails to match.
Posting this just before running out of the door, so it may need more info. If anyone can spot something glaringly obvious, then that'd be a great help! And please feel free to point me at links that I might have missed!
Have you tried something simpler such as regex against:
`sip:[a-zA-Z]*#[0-9a-zA-Z.]*`
works on terminal but haven't tried it through boost yet. If you start of with something simple then add bit by bit to make it more specific then it will be easier to track which part of the regex isn't working.
You missed the > before the semicolon:
"(sip:\\S+?#(?=\\S)[^>]+)>;"
Although actually you probably don't need the semicolon at all. Something like Scott's answer should be sufficient.
I ended up going with a modification of #David Knipe's comment - the winning regex was:
sip:\\S+#[^\\s>;]+
Which matches with or without angle brackets, up to the colon. Both answers provided did work, but being able to remove the lookahead was quite nice. I also went with the + modifiers to make some effort to find a valid URI and not a blank one.
Thanks for the help!

c++ char value + "-some words here-" but error C2110

Normally I want a variable contain this "Hey you!".
In Javascript we can
var str = 'Hey' + 'you!';
In Web language we can
$str = 'Hey'.'you!';
but in c++
+ or . also cannot combine it..
Any ideas? I believe maybe it's just a simple thing but i really have no idea how to combine this in c++, please help...
If I well understood, you just need
"Hey" "you"
(no punctuation in between)
Just a note about the space:
NOTE: in all the OP provided samples, you will get "Heyyou" with no spaces in between.
I just reproduced the OP request. (so adding a space in this answer is wrong, since it will not match the requirement)
Whether that can be not the real intention (he just wanted "Hey you") than a space after Hey or before you is required.

Splitting a title into separate parts

I need a to split a string of the form
2,9.1,The Godfather (1972), (it's a csv line)
to:
2
9.1
The Godfather
1972
any ideas for a good regular expression?
BTW,
if you know a good regular expressions creator based on examples you provide it'd be great.
I'm a bit new to this..
10x!!
(\d+)\.(\d+\.\d+),(.*?)(?= \()\((\d{4})\)
^^^^^ ^^^^^^^^^^ ^^^^^^^^^^^^ ^^^^^^^
2 9.1 Title Year
I wouldn't recommend using regex to split the csv files as it can't handle comma escaping well. But having that said, how about using the simplest available solution?
A simplest regex like this should solve your problem
'(.*?),(.*?),(.*?)\((\d+)\)'
A little time with Google gave me this: /,(?!(?:[^",]|[^"],[^"])+")/. Seeems to split CSV just fine.
>>> '2,9.1,The Godfather (1972)'.split(/,(?!(?:[^",]|[^"],[^"])+")/)
["2", "9.1", "The Godfather (1972)"]
If you are sure that the format is static, you can use this:
(\d+),(\d+\.\d+),(.*?) \((\d+)\)
But if it can contain more information, use a real CSV parser to read the line and then just split The Godfather (1972) using (.*?) \((\d+)\).
CSV has a lot of corner cases, your regexp approach might take you into a world of pain.
For example if the title has a comma in it, the title would then be double quoted. Which would screw up with all of the regexps given so far.