Regex problems in VB.net, how do I match this? - regex

I never understand how to create regular expressions and now I need one badly. It would be great if someone know how to do this.
I need to match these examples with a regex and then append text before the third comma:
Examples:
1.
Örjan,,;Svensson,,,,, and then it
continues like this
Needs to become:
Örjan,,;SvenssonNEWTEXTHERE,,,,, and
then it continues like this
2.
Patric,The-Man,Black,,,,,,,,, and then
it continues like this
Needs to become:
Patric,The-Man,BlackNEWTEXTHERE,,,,,,,,, and then
it continues like this
If I would use wildcards to do this it would look like this:
*,*,*,*
And I would like to add text just before the last comma. But I still need the whole string so the text can just be added there I don't want the characters that comes after the added text to disappear.
This is a .CSV contact file btw so you better understand the structure of the text.
Is this possible?

The regular expression for a CSV field, i.e. “any text not containing comma”, is [^,]*, if you want to skip to the end of the third field, you’ll use
[^,]*,[^,]*,[^,]*
Now, if you want to modify the string, you can use something like
Dim str = "Örjan,,;Svensson,,,,, and then it continues like this"
Dim re As New Regex("[^,]*,[^,]*,[^,]*")
Dim pos = re.Match(str).Length
Now you’ve got the position of where you want to put the additional string in pos, and you can do whatever you want with it.
Note that a CSV file can generally contain fields which contain literal commas and need to be quoted (e.g. Patric,"The,Man",Black,...). It may even contain a linebreak, which makes it quite difficult to parse properly, especially with regular expressions (and the code above would not work with such data). Can you be sure your CSV file does not contain quoted fields?

Related

Find and replace with regular expression in Notepad++

At the moment, I have a PHP function that gets the contents of a CSV file and puts it into a multi-dimensional array, which contains text that I print out in various places, using the indexes.
an example of use would be:
$localText[index][pageText][conceptQualityText][$lang];
The first index, [index], would be the name of the page. The second index [pageText] would indicate what it is (text for the page). The third index, [conceptQualityText] indicates what the actual text is. The last index, [$lang] gets the text in the desired language.
so:
->page location
->what is it
->the content
->what language it should be displayed in.
This all worked fine in the previous PHP versions. However, upgrading to 7.2, PHP seems to be a bit more strict. I was a bit more green ~2 years ago when I first made this solution, and now know that since these indexes aren't defined as strings e.g. encapsulated in single quotes like so: ['index'], they fit the notation of a superglobal (DEFINE). I didn't give it much thought back then, but now PHP seems to interpret them as so (superglobals), and so I get thrown the error that x word is an undefined superglobal.
My initial thought is to make a search and replace on my example string:
$localText[index][pageText][conceptQualityText][$lang];
using the regular expression functionality in Notepad++.
However, the example is just one of many, the notation of the array indexing is basically:
$localText[index][index2][index3][$lang];
So my question is:
How can I make use of the Notepad++ search and replace, using a regular expression, so that my index pointers become strings, instead of acting as superglobal variables?
e.g. make:
$localText[index][index2][index3][$lang];
into:
$localText['index']['index2']['index3'][$lang];
I will need some sort of logic that checks for whatever is inside the brackets and encapsulates them with single quotes, except for the last index, [$lang].
I tried to give as much information as possible, let me know if anything needs to be elaborated.
I tried to refer to these docs without much luck.
I found a solution using
this:
find: \b(localText\[)([a-zA-z0-9_\-]+)(\]\[)([a-zA-z0-9_\-]+)(\]\[)([a-zA-z0-9_\-]+)
replace: $1'$2'$3'$4'$5'$6'
and it works like a charm. Thanks for everyone who took their time to help.
You can use the following regex to match:
\[[^'](\w+)[^']\]
The regex matches a Word between Square brackets unless it quoted.
Replace with:
['$1']
The regex will not match the last brackets because it contains a '$' sign.

Finding and replacing a pattern with bold and normal characters

So as the title suggests I have a crazy thing that I need to do and was wondering if there is a faster way to do it. Basically I have a list in Word format. On each line there is data that looks like this:
Bold Text Normal Text
I need to insert something between the bold and normal text. Is there any way to find only the places that match that pattern (i.e. B space here N)? I could then easily insert what I need. Maybe something with regex?
Ok, so a bit extreme idea:
The document you are talking about, is docx? if not, I guess you can convert it to it.
I've tried that on a docx file, without a regex, but i'm sure that you'll be able to take care of this :)
So!
Extract the docx file as a zip archive
You can add .zip to the file name, as an extension, or just open with an archiver - such as 7zip.
Navigate to the folder named word, under the extracted folder.
Open document.xml with your preferred editor
Every part of the text that changes his style - has a different tag
Find some string that looks like that: <w:r w:rsidDel="00000000" w:rsidR="00000000" w:rsidRPr="00000000"><w:rPr><w:b w:val="1"/><w:rtl w:val="0"/></w:rPr><w:t xml:space="preserve">bold text </w:t></w:r>
A string style section looks like that ^
The tag <w:b w:val="1"/> with the 1 value, indicates that this string inside ("bold text ") has the bold style.
Create a string that looks like what I've shown above, and insert the text you like. If for example you want the new text to have another style, like italic, so use <w:i w:val="1"/> (with i instead of b).
My example:
I wanted to add pictures, but I don't have enough reputation :(
It looks like:
Before: bold text normal text
After: bold text hi im new normal text
The XMLs example:
https://gist.github.com/arieljannai/08756ef562962eee0798
So, the only thing you need to do now, is build a regex that will find you the parts with w:b tags and all of the surrounding, and than you have it :)
Good luck!
EDIT: A regex example I made, that matches a style string line, like I put in the example above:
(<w:r.*?>(?:<w:b\s{1}.*?\/>){1}.*?(?:<w:t\s{1}.*?>(.*?)<\/w:t>)<\/w:r>)
The regex matches a section, between the <w:r> tag (first group).
The first non-matching group make sure it has the bold tag ((?:<w:b\s{1}.*?\/>))
The second non-matching group finds the tag that the text is with in it (the <w:t> tag).
inside the second non-matching group, there's the second matching group (.*?) which actually holds the text of that style string. (second group).
So you have the whole style string in the first group, and only the actual text in the second group.

Regex: Replace every char in the search string IF they're found in order

I am building a search functionality and I am trying to make it similar to the one in Sublime Text.
Assume "cmd" as the input string and "command" is one of the results.
To search the files, among other things, I split that input by chars and end up with the following regex: c.*?m.*?d. This part is succesfull in finding files like "command", however, when I use the same regex to replace the found string with some HTML elements to evidentiate the fact that the searched string is found in that particular item, this results in something like this:
<span>command</span>
I understand exactly why this is happening and I'm looking for and alternative to display to the user something like the following:
<span>c</span>o<span>m</span><span>m</span>an<span>d</span>
Or, maybe just:
<span>c</span>o<span>m</span>man<span>d</span>
I have an idea of how to do this, which is by encapsulating every single character in between parantheses and then replace every single one with the <span>$x</span> part, but I'm not sure how to do this exactly.
Any kind of help is immensely appreciated.
Thanks,

How to use a regular expression in notepad++ to change a url

I need some help with our migrated site urls's. We moved our site from Joomla to Worpdress and IN our posts we have over 20K of internal links.
The structure of these links are like these:
www.mysite.nl/current-post-title/index.php?option=com_content&view=article&id=5259:related-post-title&catid=35:universum&Itemid=48
What we need is this:
www.mysite.nl/related-post-title
So basically we need to remove everyhing behind www.mysite.nl/ up until the colon :, i.e. remove this: current-post-title/index.php?option=com_content&view=article&id=5259: (must remove the colon itself too)
And then remove everything behind the first ampersand (including the ampersand itself) until the end of the string, i.e. remove &catid=35:universum&Itemid=48
Of course only url strings containing this index.php?option=com_content must be changed.
I have dumped the table in plain text and opened it in Notepad++ to do a search and replace with regular expression because the content that must be removed from these lines is different every time.
Can someone please help me with the right regular expression?
In find what box enter below:
(www.mysite.nl)\/.*index.php\?option=com[^:]+:([^&]+)&.*
In replace with box enter:
\1/\2
Result
www.mysite.nl/related-post-title
Go inside-out, rather than outside-in, replace \/.+&id=\d+\:(.+?)&.+ with /$1. Also, paste a few into http://www.regexr.com/ and play around, although JavaScript and Notepad++ might have some differences in implemented Regex features, e.g. negative lookbehinds.

RegEx: Match Mr. Ms. etc in a "Title" Database field

I need to build a RegEx expression which gets its text strings from the Title field of my Database. I.e. the complete strings being searched are: Mr. or Ms. or Dr. or Sr. etc.
Unfortunately this field was a free field and anything could be written into it. e.g.: M. ; A ; CFO etc.
The expression needs to match on everything except: Mr. ; Ms. ; Dr. ; Sr. (NOTE: The list is a bit longer but for simplicity I keep it short.)
WHAT I HAVE TRIED SO FAR:
This is what I am using successfully on on another field:
^(?!(VIP)$).* (This will match every string except "VIP")
I rewrote that expression to look like this:
^(?!(Mr.|Ms.|Dr.|Sr.)$).*
Unfortunately this did not work. I assume this is because because of the "." (dot) is a reserved symbol in RegEx and needs special handling.
I also tried:
^(?!(Mr\.|Ms\.|Dr\.|Sr\.)$).*
But no luck as well.
I looked around in the forum and tested some other solutions but could not find any which works for me.
I would like to know how I can build my formula to search the complete (short) string and matches everything except "Mr." etc. Any help is appreciated!
Note: My Question might seem unusual and seems to have many open ends and possible errors. However the rest of my application is handling those open ends. Please trust me with this.
If you want your string simply to not start with one of those prefixes, then do this:
^(?!([MDS]r|Ms)\.).*$
The above simply ensures that the beginning of the string (^) is not followed by one of your listed prefixes. (You shouldn't even need the .*$ but this is in case you're using some engine that requires a complete match.)
If you want your string to not have those prefixes anywhere, then do:
^(.(?!([MDS]r|Ms)\.))*$
The above ensures that every character (.) is not followed by one of your listed prefixes, to the end (so the $ is necessary in this one).
I just read that your list of prefixes may be longer, so let me expand for you to add:
^(.(?!(Mr|Ms|Dr|Sr)\.))*$
You say entirely of the prefixes? Then just do this:
^(?!Mr|Ms|Dr|Sr)\.$
And if you want to make the dot conditional:
^(?!Mr|Ms|Dr|Sr)\.?$
^
Through this | we can define any number prefix pattern which we gonna match with string.
var pattern = /^(Mrs.|Mr.|Ms.|Dr.|Er.).?[A-z]$/;
var str = "Mrs.Panchal";
console.log(str.match(pattern));
this may do it
/(?!.*?(?:^|\W)(?:(?:Dr|Mr|Mrs|Ms|Sr|Jr)\.?|Miss|Phd|\+|&)(?:\W|$))^.*$/i
from that page I mentioned
Rather than trying to construct a regex that matches anything except Mr., Ms., etc., it would be easier (if your application allows it) to write a regex that matches only those strings:
/^(Mr|Ms|Dr|Sr)\.$/
and just swap the logic for handling matching vs non-matching strings.
re.sub(r'^([MmDdSs][RSrs]{1,2}|[Mm]iss)\.{0,1} ','',name)