Regex question with spaces ruining the targeted words - regex

I'm having issues with writing out an expression to block some words.
This is my current code.
I am currently just using regex101.com to test it.
(^[GgƓɠḠḡǴǵĜĝǦǧĞğĢģǤǥĠġ](?:[^a-zA-Z]*)([ÂâÅåÀàÁáÃãÄäEeAaÆæ4#]+\s{0,30})([.*\S*]{0,1}$)|(?:[^a-zA-Z]*)[ÝýŶŷŸÿỸỹYy].)
I'm needing this to find the word "gay" but if someone writes "gay man" with a space, it doesn't even pickup the word "gay". I'm just trying to figure out why the space allows the word, I tried moving things around and adding what I could that might make sense but nothing seems to click here.

I want to share this post to you https://stackoverflow.com/a/1732454/4867612
I'm not sure I understood correctly, but with only this [GgƓɠḠḡǴǵĜĝǦǧĞğĢģǤǥĠġ][ÂâÅåÀàÁáÃãÄäEeAaÆæ4#][ÝýŶŷŸÿỸỹYy]
in https://regex101.com/r/jYpHJS/1 you can detect if the word gay is in text

Related

Regex to unwrap paragraphs: Remove returns and new lines at end of lines that have content, but not empty lines

I'm using Text Soap by Unmarked Software on Mac OS which is pretty much PCRE but uses ICU Regular Expression Syntax for its regex find and replace tool. I'm still new to Regex so I'm still learning the many intricacies. Please be patient with me.
I'm struggling to capture new lines or returns at the end of lines that have content, but not capture the new lines or returns of empty lines, or if there is an empty line immediately following.
I've tried using positive lookbehind, and positive lookahead with multiline mode but haven't been able to figure it out. With a bit of trial and error I did figure out that $ is after newline/carriage return.
I am essentially trying to unwrap paragraphs but maintain them as paragraphs.
I want input such as this example:
"I need to unblock," someone may have breathed out.\n
\n
"I know how to do it," I may have responded, picking up\n
the cue. My life has always included strong internal directives.\n
Marching orders) I call them.\n
\n
In any case, I suddenly knew that I did know how to un-\n
block people and that I was meant to do so, starting then and\n
there with the lessons I myself had learned.\n
\n
Where did the lessons come from?\n
\n
In 1978, in January, I stopped drinking. I had never\n
thought drinking made me a writer, but now I suddenly\n
thought not drinking might make me stop. In my mind,\n
drinking and writing went together like, well, scotch and\n
soda. For me, the trick was always getting past the fear and\n
onto the page. I was playing beat the clock-trying to write be-\n
fore the booze closed in like fog and my window of creativity\n
was blocked again.\n
To output this:
"I need to unblock," someone may have breathed out.\n
\n
"I know how to do it," I may have responded, picking up the cue. My life has always included strong internal directives. Marching orders) I call them.\n
\n
In any case, I suddenly knew that I did know how to un-block people and that I was meant to do so, starting then and there with the lessons I myself had learned.
\n
Where did the lessons come from?\n
\n
In 1978, in January, I stopped drinking. I had never thought drinking made me a writer, but now I suddenly thought not drinking might make me stop. In my mind, drinking and writing went together like, well, scotch and soda. For me, the trick was always getting past the fear and onto the page. I was playing beat the clock-trying to write be-fore the booze closed in like fog and my window of creativity was blocked again.\n
If I understand correctly, you can use this regex:
(?<!\n)\n(?!\n)
replace with empty string.
If you want to look for characters other than new lines, you can replace all the \n with the character/string that you want to find instead. For example, if your newline is \r\n. use:
(?<!\r\n)\r\n(?!\r\n)
Essentially, the regex finds a newline that neither follows nor is followed by another newline. And replacing by an empty string removes it.
I cobbled together this rudimentary Regex, but I'm assuming that it may miss certain kinds of visually empty lines. I would greatly appreciate feedback if there are ways this regex could fail or how it could be greedier than I expected, or how to improve on it. I welcome others to play around with my current solution on regex101.com and fork it etc to play or teach me something.
(?<=.$)([\r\n\f\v]?)(?!^$)
Substitute $1 with \s.

select area within characters using regex (spaces are an issue)

Some other guy asked a similar question earlier which got a lot of down votes, and I was interested in solving it. I came to a similar issue and would like some help with it.
Take into consideration this wall of text:
__don't__ and __do it__
__yellow__
__green__ and __purple__
I would like to select all the area within the underscores __'s
I attempted the following regex:
/__[!-~]+__/g which worked great on most things. I would like to add the ability to have spaces within the underscores. __do it__ will not be encapsulated in the search because it includes a space which was ruled out by the regex. I attempted the following:
/__[ -~]+__/g
It didn't work as planned, and selected everything from the very first __ to the very last. I was wondering how to tell the regex it has reached the end of a search once it sees a space after a __.
Here is the regex you could play around with below:
http://regexr.com/39br7
I tried using __[^ ]/g at the end but It didn't seem to help.
You could simply use the below regex,
__[^_]*__
DEMO
__(.*?)__
This seems to work.Look at the demo.
http://regex101.com/r/lJ1jB1/1

Detect URL in a string without any whitespace regexp

So I know the idea of catching any URL is a very difficult task, and that's not what i'm wanting to do. I'm wanting to find a piece of regex that'll catch urls in the form of
http://something.xx.yy
http://www.something.xxx
www.something.xx.yy
in a string that will contain lots of other text and no whitespace, so for example
hellopleasevisitwww.something.xxthankyou
I've tried my best to detect something like that by myself, but it's been pretty fruitless. Any help would be great. Below are some of the expressions I tried to modify in order to have these requirements met
.*\\(?\\b(http://|www[.])[-A-Za-z0-9+&##/%?=~_()|!:,.;]*[-A-Za-z0-9+&##/%=~_()|].*
\\b\\w*\\(?\\b(http://|www[.])[-A-Za-z0-9+&##/%?=~_()|!:,.;]*[-A-Za-z0-9+&##/%=~_()|]\\w*\\b
\\(?\\b(http://|www[.])[-A-Za-z0-9+&##/%?=~_()|!:,.;]*[-A-Za-z0-9+&##/%=~_()|]
Thanks for your time
If it really can be as simple as you're saying...
(http://(www\\.)?|www\\.)[^.]+\\.(\\w{3}|\\w{2}\\.\\w{2})
The expressions you tried all have \\b which is a word boundary and your string unfortunately does not have word boundaries.
See it in action

In what ways can I improve this regular expression?

I have written this regex that works, but honestly, it’s like 75% guesswork.
The goal is this: I have lots of imports in Xcode, like so:
#import <UIKit/UIKit.h>
#import "NSString+MultilineFontSize.h"
and I only want to return the categories that contain +. There are also lots of lines of code throughout the source which include + in other contexts.
Right now, this returns all of the proper lines throughout the Xcode project. But if there is one thing I’ve learned from googling and searching Stack Overflow for regex tutorials, it is that there are LOTS of different ways to do things. I’d love to see all of the different ways you guys can come up with that make it either more efficient or more bulletproof regarding potential spoofs or misses.
^\#import+.[\"]*+.(?:(?!\+).)*+.*[\"]
Thanks in advance for all of your help.
Update
Also I suppose I’ll accept the answer of whoever does this with the shortest string, without missing any possible spoofs. But again, thanks to everyone who participates in this learning experience.
Resources from answers
This is an awesome resource for practicing regex from Dan Rasmussen: RegExr
The first thing I notice is that your + characters are misplaced: t+. matches t one or more times, followed by a single character .. I'm assuming you wanted to match the end of import, followed by one or more of any character: import.+
Secondly, # doesn't need to be escaped.
Here's what I came up with: ^#import\s+(.*\+.*)$
\s+ matches one or more whitespace character, so you're guaranteed that the line actually starts with #import and not #importbutnotreally or anything else.
I'm not familiar with xcode syntax, but the following part of the expression, (.*\+.*), simply matches any string with a + character somewhere in it. This means invalid imports may be matched, but I'm working under the assumption your trying to match valid code. If not, this will need to be modified to validate the importer syntax as well.
P.S. To test your expression, try RegExr. You can hover over characters to check what they do.
sed 's:^#import \(.*[+].*\):\1:' FILE
will display
"NSString+MultilineFontSize.h"
for your sample.

Create a valid CSV with regular expressions

I have a horribly formated, tab delimited, "CSV" that I'm trying to clean up.
I would like to quote all the fields; currently only some of them are. I'm trying to go through, tab by tab, and add quotes if necessary.
This RegEx will give me all the tabs.
\t
This RegEx will give me the tabs that do not END with a ".
\t(?!")
How do I get the tabs that do not start with a "?
Generally for these kinds of problems, if it's a one time occurrence, I will use Excels capabilities or other applications (SSIS? T-SQL?) to produce the desired output.
A general purpose regex will usually run into bizarre exceptions and getting it just right will often take longer and is prone to missed groups your regex didn't catch.
If this is going to happen regularly, try to fix the problem at the source and/or create a special utility program to do it.
Use negative lookbehind: (?<!")\t
For one shots like this I usually just write a little program to clean up the data, that way I also can add some validation to make sure it really has converted properly after the run. I have nothing against regex but often in my case it takes longer for me figure out the regex expression than writing a small program. :)
edit: come to think about it, the main motivator is that it is more fun - for me at least :)