I'm trying to retrieve 2 fields from a web page. I'm using the following two patterns:
string paternExperience = #"Experience\s\:\s\<strong\>(?<Level>.*?)\<";
string paternAccount = #"account_value\""\>(?<Account>.*?)\<";
and the following method to retrieve values and it works.
Regex.Matches(pageBody, patern..., RegexOptions.IgnorePatternWhitespace | RegexOptions.IgnoreCase | RegexOptions.Compiled |RegexOptions.Multiline);
I was trying to avoid using twice the method to retrieve 2 values, and I'm trying to create a pattern to get Level and Account in just one call of the Matches method. So I thought that something like the one below should work...
string paternBoth = #"Experience\s\:\s\<strong\>(?<Level>.*?)\< .* account_value\""\>(?<Account>.*?)\<";
But it doesn't work because I think that the two values are on diferent lines in html, so I added RegexOptions.SingleLine and now the method times out (the page has around 20kb).
Can you help me please with some advice? Thank you!
You could try putting those 2 values in 1 variable, then just check that variable with your regex.
I know it doesnt really make any sense but I try out things like that and sometimes it actually works.
Never had this scenario but I did have any simular problems in the past.
Might not be the best way. but sometimes making it work is more important then making it look pretty. ;)
Related
I'm trying to keep lines that contain the word "NOA" in a column A which has many multi-line cells as can be viewed in this Google Spreadsheet.
If "NOA" is present then, I would like to keep the line. The input and output should look like the image which I have "working" with too-many helper cells. Can this be combined into a single formula?
Theoretical Approaches:
I have been thinking about three approaches to solve this:
ARRAYFORMULA(REGEXREPLACE - couldn't get it to work
JOIN(FILTER(REGEXMATCH(TRANSPOSE - showing promise as it works in multiple steps
Using the QUERY Function - unfamiliar w/ function but wondering if this function has a fast solution
Practical attempts:
FIRST APPROACH: first I attempted using REGEXEXTRACT to extract out everything that did not have NOA in it, the Regex worked in demo but didn't work properly in sheets. I thought this might be a concise way to get the value, perhaps if my REGEX skill was better?
ARRAYFORMULA(REGEXREPLACE(A1:A7, "^(?:[^N\n]|N(?:[^O\n]|O(?:[^A\n]|$)|$)|$)+",""))
I think the Regex because overly complex, didn't work in Google or perhaps the formula could be improved, but because Google RE2 has limitations it makes it harder to do certain things.
SECOND APPROACH:
Then I came up with an alternate approach which seems to work 2 stages (with multiple helper cells) but I would like to do this with one equation.
=TRANSPOSE(split(A2,CHAR(10)))
=TEXTJOIN(CHAR(10),1,FILTER(C2:C7,REGEXMATCH(C2:C7,"NOA")))
Questions:
Can these formulas be combined and applied to the entire Column using an Index or Array?
Or perhaps, the REGEX in my first approach can be modified?
Is there a faster solution using Query?
The shared Google spreadhseet is here.
Thank you in advance for your help.
Here's one way you can do that:
=index(substitute(substitute(transpose(trim(
query(substitute(transpose(if(regexmatch(split(
filter(A2:A,A2:A<>""),char(10)),"NOA"),split(
filter(A2:A,A2:A<>""),char(10)),))," ","❄️")
,,9^9)))," ",char(10)),"❄️"," "))
First, we split the data by the newline (char 10), then we filter out the lines that don't contain NOA and finally we use a "query smush" to join everything back together.
Whats the best way to remove a query string (the question mark variables) from a image url.
Say I got a good image such as
http://i.ebayimg.com/00/s/MTYwMFgxNjAw/z/zoMAAOSwMpZUniWv/$_12.JPG?set_id=880000500F
But I can't really save it properly without adding a bunch of useless checking code because of the query string crap after it.
I just need
http://i.ebayimg.com/00/s/MTYwMFgxNjAw/z/zoMAAOSwMpZUniWv/$_12.JPG
Looking for the proper regular expression that handles this so I could replace it with blank.
It might be simple enough not to worry about regex.
This would work:
Dim cleaned = url.Substring(0, url.IndexOf("?"c))
I am attempting to write a MVC model validation that verifies that there is 10 or more words in a string. The string is being populated correctly, so I did not include the HTML. I have done a fair bit of research, and it seems that something along the lines of what I have tries should work, but, for whatever reason, mine always seem to fail. Any ideas as to what I am doing wrong here?
(using System.ComponentModel.DataAnnotations, in a mvc 4 vb.net environment)
Have tried ([\w]+){10,}, ((\\S+)\s?){10,}, [\b]{20,}, [\w+\w?]{10,}, (\b(\w+?)\b){10,}, ([\w]+?\s){10}, ([\w]+?\s){9}[\w], ([\S]+\s){9}[\S], ([a-zA-Z0-9,.'":;$-]+\s+){10,} and several more varaiations on the same basic idea.
<Required(ErrorMessage:="The Description of Operations field is required"), RegularExpression("([\w]+){20,}", ErrorMessage:="ERROZ")>
Public Property DescOfOperations As String = String.Empty
Correct Solution was ([\S]+\s+){9}[\S\s]+
EDIT Moved accepted version to the top, removing unused versions. Unless I am wrong and the whole sequence needs to match, then something like (also accounting for double spaces):
([\S]+\s+){9}[\S\s]+
Or:
([\w]+?\s+){9}[\w]+
Give this a try:
([a-zA-Z0-9,.'":;$-]+\s){10,}
Im in the process of learning regular expressions but still cant really wrap my head around it quite yet. However I need to create one for Google Analytics and was hoping someone could help out.
Currently my Goal page is head-match:
/checkout/cart?complete
and funnel step:
/checkout/onepage
The problem is that the funnel step could be several different slightly different URLs. It could be:
/checkout/onepage
/checkout/onepage/index
/checkout/multishipping/login
/checkout/multishipping/billing
/checkout/multishipping/shipping
Can anyone tell me what the expression would be to "lump" those 5 potential URLs as the same thing? Also, what would I change my Goal url to if the potential outcomes could be one of the below examples:
/checkout/cart?complete=10000245 <-- (single order)
/checkout/cart?complete=10000245,10000246,10000247 <-- (multiship order)
I know I would have to escape the question mark first but after that Im not sure.
For your goal page you'll want to use the + ? and * operators.
/checkout/cart\?complete(=(\d+,?)*)?
For funnel you'll want the | and ? operators
/checkout/(onepage(/index)?|multishipping/(login|billing|shipping))
My question may be a bit strange, but it's been bothering me since the behavior is not what I expected. Here is my query:
query = request.GET.get('q','')
#in search_indexes:
#start_datetime = indexes.DateTimeField(model_attr='start_datetime',null=True)
#end_datetime = indexes.DateTimeField(model_attr='end_datetime')
search_events = SearchQuerySet().models(Event).filter(content=query).
filter(end_datetime__gte=datetime.now()).
order_by("start_datetime")
Now I type in a query like "asdfasdfjasldf lolol hwtf asdlfka" and I still get 3 results. (Note, I only have 5 events to start with. Not sure if that could affect anything.) I print out the scores, and they are [42,42,42]. Doesn't filter() match on exact phrases? Especially if I use quotes?
//edit
I also tried using auto_query, and the results are the same.
I'm really confused about what's happening, so hopefully somebody can help clear this up. Thanks in advance!
Turns out that someone else on my team had set HAYSTACK_DEFAULT_OPERATOR to 'OR' instead of 'AND'. Explains everything - the additional filter tag was actually expanding the number of results!
You might like to perform search using auto_query():
search_events = SearchQuerySet().models(Event)
.auto_query(query)
.filter(end_datetime__gte=datetime.now())
.order_by("start_datetime")
It has some extra features, like for example exact query searching when phrase is enclosed in quotes.