Find And Replace with Multiple Words - replace

I am trying to replace the same word in multiple files with different words.
for instance, document 1 should search for word x, and replace it with word 1 from the list.
Document 2 replaces word x with word 2 from the list
document 3 replaces word x from word 3 from the list etc. etc.
I have tried textcrawler, and some macros - but I don't know if this is possible or the best way to start.

Related

RegexReplace the nth occurrence of a string of underscores

I'm having trouble getting a REGEXREPLACE working in a Google Sheets formula. I'm aiming to replicate a certain card game which is opposed to humankind. I have a cell containing a string which contains one, two or three occurrences of a series of underscores, e.g.
"_____ is the new _____"
And let's say I want to substitute in the strings "Orange" for the first occurrence, and "Black" for the second occurrence.
I don't know how many underscores will be in each string, it could be one or more, so it seems like a job for regex. I tried SUBSTITUTE and it didn't seem to recognise asterisks. Based on this link, I tried using {1} {2} and {3} to match the first/second/third occurrence, but I'm not doing something right:
=REGEXREPLACE(G16,".*(_*){1}.*",G17)
G16 is: _____ is the new _____.
G17 is: Orange
The output of the formula is: OrangeOrange.
Can anyone help me figure out the correct way to do this?
You may use
=REGEXREPLACE(REGEXREPLACE(G16,"^([^_]*)_+","$1Orange"), "^([^_]*)_+", "$1Black")
|----- First occurrence -----------------|
|----------------- Second occurrence ------------------------------------------|
Details
^ - start of string
([^_]*) - Capturing group 1 ($1 will refer to this group value): 0 or more chars other than an underscore
_+ - 1 or more underscores.

Joining two lines based on specific characters Notepad ++

I'm trying to join lines of data information in Notepad ++, currently, the data looks like this:
It has the above format for about 100,000 rows. I want to combine row 1 with row 2, but sometimes row 2 and row 3 combine and look something like this:
I want the output to look like this (all on one line):
I tried using this formula:
SEARCH: (.+)\R(.+)
REPLACE: \1 \2
If you want to match specific characters in Regex, you can simply type that character. for example, apple will only match apple. If you want to match a number, you can use \d. This will match 8, but not d.
If you want to match only things that end in 4 numbers separated by a dot, try this one: \n(.*?\d\d\.\d\d)\n
An explanation for each part can be found here.

How Can I Create a RegEx Pattern that will Get N Words Using Custom Word Boundary?

I need a RegEx pattern that will return the first N words using a custom word boundary that is the normal RegEx white space (\s) plus punctuation like .,;:!?-*_
EDIT #1: Thanks for all your comments.
To be clear:
I'd like to set the characters that would be the word delimiters
Lets call this the "Delimiter Set", or strDelimiters
strDelimiters = ".,;:!?-*_"
nNumWordsToFind = 5
A word is defined as any contiguous text that does NOT contain any character in strDelimiters
The RegEx word boundary is any contiguous text that contains one or more of the characters in strDelimiters
I'd like to build the RegEx pattern to get/return the first nNumWordsToFind using the strDelimiters.
EDIT #2: Sat, Aug 8, 2015 at 12:49 AM US CT
#maraca definitely answered my question as originally stated.
But what I actually need is to return the number of words ≤ nNumWordsToFind.
So if the source text has only 3 words, but my RegEx asks for 4 words, I need it to return the 3 words. The answer provided by maraca fails if nNumWordsToFind > number of actual words in the source text.
For example:
one,two;three-four_five.six:seven eight nine! ten
It would see this as 10 words.
If I want the first 5 words, it would return:
one,two;three-four_five.
I have this pattern using the normal \s whitespace, which works, but NOT exactly what I need:
([\w]+\s+){<NumWordsOut>}
where <NumWordsOut> is the number of words to return.
I have also found this word boundary pattern, but I don't know how to use it:
a "real word boundary" that detects the edge between an ASCII letter
and a non-letter.
(?i)(?<=^|[^a-z])(?=[a-z])|(?<=[a-z])(?=$|[^a-z])
However, I would want my words to allow numbers as well.
IAC, I have not been able how to use the above custom word boundary pattern to return the first N words of my text.
BTW, I will be using this in a Keyboard Maestro macro.
Can anyone help?
TIA.
All you have to do is to adapt your pattern ([\w]+\s+){<NumWordsOut>} to, including some special cases:
^[\s.,;:!?*_-]*([^\s.,;:!?*_-]+([\s.,;:!?*_-]+|$)){<NumWordsOut>}
1. 2. 3. 4. 5.
Match any amount of delimiters before the first word
Match a word (= at least one non-delimiter)
The word has to be followed by at least one delimiter
Or it can be at the end of the string (in case no delimiter follows at the end)
Repeat 2. to 4. <NumWordsOut> times
Note how I changed the order of the -, it has to be at the start or end, otherwise it needs to be escaped: \-.
Thanks to #maraca for providing the complete answer to my question.
I just wanted to post the Keyboard Maestro macro that I have built using #maraca's RegEx pattern for anyone interested in the complete solution.
See KM Forum Macro: Get a Max of N Words in String Using RegEx

Regular Expressions in R

I found somewhat similar questions
R - Select string text between two values, regex for n characters or at least m characters,
but I'm still having trouble
say I have a string in r
testing_String <- "AK ADAK NAS PADK ADK 70454 51 53N 176 39W 4 X T 7"
And I need to be able to pull anything between the first element in the string that contains 2 characters (AK) and PADK,ADK. PADK and ADK will change in character but will always be 4 and 3 characters in length respectively.
So I would need to pull
ADAK NAS
I came up with this but its picking up everything from AK to ADK
^[A-Za-z0_9_]{2}(.*?) +[A-Za-z0_9_]{4}|[A-Za-z0_9_]{3,}
If I understood your question correctly, this should do the trick:
\b[A-Z]{2}\s+(.+?)\s+[A-Z]{4}\s+[A-Z]{3}\b
Demo
You'll have to switch the perl = TRUE option (to use a decent regex engine).
\b means word boundary. So this pattern looks for a match starting with a 2-letter word and ending with a 4 letter word followed by a 3 letter word. Your value will be in the first group.
Alternatively, you can write the following to avoid using the capturing group:
\b[A-Z]{2}\s+\K.+?(?=\s+[A-Z]{4}\s+[A-Z]{3}\b)
But I'd prefer the first method because it's easier to read.
Lookbehind is supported for perl=TRUE, so this regex will do what you want:
(?<=\w{2}\s).*?(?=\s+[^\s]{4}\s[^\s]{2})

regex account for single letters or multiple based on list

My goal is to find all matches in a word list using regex. I got it working, but I would like to be able to specify multiple letters and have each letter occur only the number of times it is specified.
my code:
import re
with open('wordlist.txt') as f:
content = f.readlines()
def search(regex):
pattern=re.compile(regex)
for word in content:
word=word.strip()
if(pattern.findall(word)):
print(word)
Examples:
search(r'^(b|e|a|c|h|e|s|q|r){7}$') match only words with 7 of those 9 letters. only letters in the word can be those 9. In this case beaches would be returned
search(r'^(f|o|o|c|l|t){4}$') match only words with 4 of those 6 letters. only letters in the word can be those 6. In this case foot and fool and colt would be returned
search(r'^(f|o|d|c|l|t){4}$') match only words with 4 of those 6 letters. only letters in the word can be those 6. In this case only colt would be returned
I don't think that a regex is the way to go here. You don't care about order, just about how many of each letter there are. That sounds like a job for an array or a dict.
How about making the argument to search a dict where the keys are each letter, and the values are the number of times that letter is allowed to appear? Then just deep copy the dict, iterate over the string, and decrement. If the key's not found, or it's already 0, fail and move on to the next string.