How to create a regex to check whether a set of words exists in a given string? - regex

How can I write a regex to check if a set of words exist in a given string?
For example, I would like to check if a domain name contains "yahoo.com" at the end of it.
'answers.yahoo.com', would be valid.
'yahoo.com.answers', would be wrong. 'yahoo.com' must come in the end.
I got a hint from somewhere that it might be something like this.
"/^[^yahoo.com]$/"
But I am totally new to regex. So please help with this one, then I can learn further.

When asking regex questions, always specify the language or application, too!
From your history it looks like JavaScript / jQuery is most likely.
Anyway, to test that a string ends in "yahoo.com" use /.*yahoo\.com$/i
In JS code:
if (/.*yahoo\.com$/i.test (YOUR_STR) ) {
//-- It's good.
}
To test whether a set of words has at least one match, use:
/word_one|word_two|word_three/
To limit matches to just the most-common, legal sub-domains, ending with "yahoo.com", use:
/^(\w+\.)+yahoo\.com$/
(As a crude, first pass)
For other permutations, please clarify the question.

Related

Regex: How to extract dialogue tags from fiction, with speaker information

Totally stumped on this. I need help extracting dialogue from a story so I can hand it off for narration.
Basically, this is a problem where I have a big chunk of text (a novel), and I want to extract all the dialogue from the text in a format I can pipe into a spreadsheet.
But, I also want, if it exists, the speaker information as well. So, given a string like:
'"I'm really hungry," she said.'
I would like the values returned as:
[ "I'm really hungry", "she said" ]
If there is no dialogue, as in this example:
"I'm not hungry."
the result would just be:
["I'm really hungry."]
Is this madness? Is it even possible? I have fooled around with this regex (am not a regex guru, knowing only enough to be dangerous):
"([^"]*)"
Which seems to get the dialogue tags, but doesn't get the speaker info. Any advice in how to get the speaker info as well would be greatly appreciated. I've been wrestling with this for awhile now.
Maybe a better approach would be to get the dialogue in one field, and the entire paragraph it is found in as the second field. That could also work, but I have no idea where to start with this.
Basically I want to put these all into a spreadsheet so I can hand them off to a narrator with enough context that they know whose dialogue is who's in the story.
Any help is greatly appreciated!
It definitely is possible
Look at this regex: ^.*?'?(?P<line>\".*\")(?P<actor>[^'\n]*)'?.*?$
demo here: https://regex101.com/r/UCRZwY/5
It basically marks the outer quotes as optional, but if it does find them, stores whatever provided as '$actor' (and the line as '$line') these are of course just names i've given them, feel free to change
Note updated to include such text as part of regular sentence, see example in demo

Regex-Match while ignoring a char from Searchword

I am using an Engineering Program which lets me Code formulas in order to filter out specific lines in a database. I am trying to look for a certain line in the database which contains e.g. "concrete" as a property.
In the Code I can use regular expressions.
The regex I was using so far looked like this:
".*(concrete).*";
so if the line in the database contains concrete, I will get the wanted result.
Now the Problem is: i would like to switch the word concrete with a variable, so that it Looks like this:
".*(#VARIABLE1).*";
(the Syntax with the # works in the program btw.)
the Problem is: if i set the variable as concrete, the program automatically switches it for 'concrete' . Obviously, the word concrete cant be found anymore, since the searchterm now contains the two ' Symbols in the beginning and i the end.
Is there a way to ignore those two characters using the Right regex?
what I want it to do is the following:
If a line in the database contains "25cm concrete in Grey"
I should get a match from the regex.
with the searchterm ".*(concrete).*"; it works, with the variable ".*(#VARIABLE1).*"; it doesnt.
EDIT:
the whole "Formula" in the program Looks like that:
if(Match(QTO(Typ:="Attribut{FloorsLayer_02_MaterialName}");".*(#V_QUALITY).*" ;"regex") ;QTO(Typ:="Attribut{Fläche}");0)
I want the if-condition to be true, when the match inside is true.
the whole QTO function is just the programs Syntax to use a certain Attribute into the match-function, the middle part is my Problem. I really don't know the programming language or anything,I'm new to this. hope it helps!
Thats more of a hack than a real solution and i'm not sure if it even works:
if you use the regex
.*(#VARIABLE1)?).*
and the string ?concrete(
this will result in a regex looking like this:
.*('?concrete(')?).*
which makes the additional characters optional.
This uses the following assumtption:
the string (#VARIABLE1) gets replaced by the ('<content of VARIABLE1>')

String replacement in Perl after comparing with the original string

I want to compare a string in perl below is detail description
original string
../db/proj/upload/1/22352/eng_wall_paper.jpg
I need to extract the file name eng_wall_paper.jpg from the string
and compare it with variable and append the new variable to the string.
new required string
../db/proj/upload/1/22352/new string.jpg
how can it be done, thanks in advance .
Now as you can see in the comment in your question that you really need to be specific about where you are actually stuck in to ask a question in SO. However, after looking at your other questions in SO it looks like you are completely new in programming world and need serious helping hand. Hence I think it would be helpful for you to get some useful reference to solve your problem.
If I breakdown your main problem statement I can see three parts
1. Need to extract the file name,
Most recommended way to do it is using File::Basename. However, its possible to use regex too. You need to learn how to write regex and how to capture a group. This link should be really helpful.
2. Compare it with variable,
Whichever method you use from above you should get the filename in a variable or in $1. Just compare that whatever you want to compare with. Make sure to use eq instead of ==. Find more details here.
3. Append the new variable to the string,
By now you should have the old filename in a variable, say $oldname, from first step and the new filename in another variable, say $newfilename. This question is already answered several times in SO, like this one. Hope that helps.
The actual solution is merely a 3 or 4 lines of code which I want you to figure out. Good luck.

Using Regex to validate the number of words in a text area

I am attempting to write a MVC model validation that verifies that there is 10 or more words in a string. The string is being populated correctly, so I did not include the HTML. I have done a fair bit of research, and it seems that something along the lines of what I have tries should work, but, for whatever reason, mine always seem to fail. Any ideas as to what I am doing wrong here?
(using System.ComponentModel.DataAnnotations, in a mvc 4 vb.net environment)
Have tried ([\w]+){10,}, ((\\S+)\s?){10,}, [\b]{20,}, [\w+\w?]{10,}, (\b(\w+?)\b){10,}, ([\w]+?\s){10}, ([\w]+?\s){9}[\w], ([\S]+\s){9}[\S], ([a-zA-Z0-9,.'":;$-]+\s+){10,} and several more varaiations on the same basic idea.
<Required(ErrorMessage:="The Description of Operations field is required"), RegularExpression("([\w]+){20,}", ErrorMessage:="ERROZ")>
Public Property DescOfOperations As String = String.Empty
Correct Solution was ([\S]+\s+){9}[\S\s]+
EDIT Moved accepted version to the top, removing unused versions. Unless I am wrong and the whole sequence needs to match, then something like (also accounting for double spaces):
([\S]+\s+){9}[\S\s]+
Or:
([\w]+?\s+){9}[\w]+
Give this a try:
([a-zA-Z0-9,.'":;$-]+\s){10,}

Regexp to parse out a person's name?

This might be a hard one (if not impossible), but can anyone think of a regular expression that will find a person's name, in say, a resume? I know this won't be 100% accurate, but I can't come up with something.
Let's assume the name only shows up once in the document.
No, you can't use regular expressions for this. The only chance you have is if the document is always in the same format and you can find the name based on the context surrounding it. But this probably isn't the case for you.
If you are asking your applicants to submit their résumé online you could provide a separate field for them to enter their name and any other information you need instead of trying to automatically parse résumés.
Forget it - seriously.
Or expect to get a lot of applications from a Mr C Vitae
In my experience, having written something very similar (but a very long time ago), about 95% of resumes have the person's name as the very first line. You could probably have a pretty loose regex checking for alpha, hyphens, periods, and assume that's the name.
Obviously there's no way to do this 100% accurately, as you said, but this would be close.
Unless you wanted to build an expression that contained every possible name, or-ed together, the expression you are referring to is not "Regular," with a capital R. A good guess might be to go looking for the largest-font words in the document. If they follow a pattern that looks like firstname-lastname, name-initial-name, etc., you could call it a good guess...
That's a really hairy problem to tackle. The regex has to match two words that could be someone's name. The problem with that is that some people, of Hispanic origin, for example, might have a name that's more than 2 words. Also, how would you define two words to match for a name? Would you use a database of common first and last name fields? That might work unless someone has an uncommon name.
I'm reminded of a story of a COBOL teacher in college told me about an individual of Asian origin who's name would break every rule the programmers defined for a bank's internal system. His first name was "O." just the letter O.
The only remotely dependable way to nail down the regex would be if you had something to set off your search with; maybe if a line of text in the resume began with "Name: " then you'd know where to start looking.
tl;dr: People's names and individual resumes are too heavily varied for a regular expression to pick apart.
You could do something like Amazon does for book overviews: SIPs. This would require some after-the-fact double checking by humans but you might find the person's name(s) in there.