I want to find all occurrences of freeWifi = "Y" and state = "NY" in the a json string but if there 2 consecutive occurences it consider them as one match instead of 2:
The pattern I used is '"freeWifi": "Y",(\s+\S+)+state": "NY"'.
When I use '"freeWifi": "Y",(\s+\S+){5}state": "NY"' it gives me the desired solution but it is not general enough in case the new lines are added to the json file.
Part of the data:
"freeWifi": "Y",
"storeNumber": "14372",
"phone": "(305)672-7055",
"state": "NY",
"storeUrl": "http://www.mcflorida.com/14372",
"playplace": "N",
"address": "1601 ALTON RD",
"storeType": "FREESTANDING",
"archCard": "Y",
"driveThru": "Y"
}
"type": "Feature",
"properties": {
"city": "MIAMI",
"zip": "33135",
"freeWifi": "Y",
"storeNumber": "7408",
"phone": "(305)285-0974",
"state": "NY",
"storeUrl": "http://www.mcflorida.com/7408",
"playplace": "Y",
"address": "1400 SW 8TH ST",
"storeType": "FREESTANDING",
"archCard": "Y",
"driveThru": "Y"
}
},
{
Part II
After implementing Steven solution, when I tried it on the data file with many entries, the program ran forever and did not give an answer.
The new regex is: '"freeWifi": "Y",(\s+?\S+?)+?state": "NY"'.
To see why the system hangs I checked the program against part of the data, increasing the size by 100,000 bytes each time. The results shows significant slowdown as the size increases' showing possibly problem of the regex, as explained in Program run forever when matching regex.
Sorry for the lousy display of the table, but I could not make it nicer (I removed tabs and padded with spaces but it ignores them)
Time_Passed.....Size_Checked File_Size Matches
7.3e-05 ...........100000 8345167 30
0.008906 200000 8345167 30
0.466485 300000 8345167 31
0.500054 400000 8345167 75
0.523969 500000 8345167 142
0.553361 600000 8345167 201
0.586032 700000 8345167 201
1.072181 800000 8345167 338
1.114541 900000 8345167 482
1.157304 1000000 8345167 630
1.203889 1100000 8345167 630
1.625656 1200000 8345167 630
3.126974 1300000 8345167 630
6.501044 1400000 8345167 630
12.476704 1500000 8345167 630
The lazy operator is ?. Your expression with the lazy operator would be "freeWifi": "Y",(\s+?\S+?)+state": "NY" See example in regexr.
As #anubhava has pointed out, this is not going to work on generic input. For example I imagine that you don't want this match:
"type": "Feature",
"properties": {
"freeWifi": "Y",
"storeNumber": "9876",
"state": "PA"
}
},
"type": "Feature",
"properties": {
"freeWifi": "N",
"storeNumber": "1234",
"state": "NY",
}
},
Related
I have documents like this..
"auction_list": [
{
"auction_id": 11368494,
"domain": "51love.cn",
"utf_name": "51love.cn",
"is_idn": false,
"auction_type": "backorder",
"currency": "USD",
"current_bid_price": "36.00",
"bids": 25,
"bidders": 4,
"time_left": "4 days, 5 hours",
"start_time": "2021/06/15 14:30 PST",
"start_time_stamp": 1623792604841,
"end_time": "2021/06/22 03:02 PST",
"end_time_stamp": 1624356147000,
"estibot_appraisal": "$0.00"
},
{
"auction_id": 11381539,
"domain": "meiguihualove.cn",
"utf_name": "meiguihua.cn",
"is_idn": false,
"auction_type": "backorder",
"currency": "USD",
"current_bid_price": "15.99",
"bids": 5,
"bidders": 4,
"time_left": "5 days, 5 hours",
"start_time": "2021/06/16 14:30 PST",
"start_time_stamp": 1623879010264,
"end_time": "2021/06/23 03:02 PST",
"end_time_stamp": 1624442573000,
"estibot_appraisal": "$0.00"
},
{
"auction_id": 11273186,
"domain": "surpass.cn",
"utf_name": "surpass.cn",
"is_idn": false,
"auction_type": "backorder",
"currency": "USD",
"current_bid_price": "14.99",
"bids": 4,
"bidders": 4,
"time_left": "1 day, 5 hours",
"start_time": "2021/06/09 14:30 PST",
"start_time_stamp": 1623274205156,
"end_time": "2021/06/19 03:02 PST",
"end_time_stamp": 1624096958000,
"estibot_appraisal": "$40.00"
}
I have total 30000 objects in that array.
I want to search for the domain if the domain word contain love (with regex). and want to slice the result first 1 to 50th item and them 50th to 100 if found?
I tried
ExpiredDomains.find({auction_list:{$elemMatch:{domain:'/love/'}}})
and
ExpiredDomains.find({auction_list:{$elemMatch:{domain:{$regex: 'love'}}})
but this return nothing.
how can I do that? I am using Node js.
Thanks in Advance
Below aggregation can filter all the records with a regex.
db.collection.aggregate([
{
"$project": {
"auction_list": {
"$filter": {
"input": "$auction_list",
"as": "list",
cond: {
"$regexMatch": {
"input": "$$list.domain",
"regex": /love/
}
}
}
}
}
},
{
"$unwind": "$auction_list"
},
{
"$skip": 0
},
{
"$limit": 50
}
])
Edit: For limiting and skipping.
Assing two different variable:
const a = 0;
const b = 50;
//And use it like
{
"$skip": a
},
{
"$limit": b
}
And dynamically change them. Skip will ignore first x record. If you'd give skip = 50 limit = 100 You'll get second 50 record.
I need to find all the results that start with certain input for example for the inputs: "Paul", "pau", "paul Gr", "Paul Green", "Paul Gree" , "Pel", "pele", "joh","john" etc.. The search has to be case insensive..
it suppose to return all of these(the input search string is at least 3 characters long):
[
{
"_id": ObjectId("5e6ffe413f71835ae3aa4b60"),
"f": "Paul",
"id": 11811,
"l": "Pelè",
"r": 64
},
{
"_id": ObjectId("5e6ffe413f71835ae3aa4b65"),
"f": "paul",
"id": 11811,
"l": "walker",
"r": 64
},
{
"_id": ObjectId("5e6ffe413f71835ae3aa4b66"),
"f": "johnny",
"id": 11811,
"l": "Green",
"r": 64
}
]
tried to do the following:
contain_searched_term_players = list(db.players_collection.find({'$or': [{'f': {'$regex': searched_player_name_string, '$options': 'i'}},
{'l': {'$regex': searched_player_name_string, '$options': 'i'}},
{'c': {'$regex': searched_player_name_string, '$options': 'i'}}]}).sort([{'r', -1}])
but it doesnt work for "Paul Green"
searched_player_name_string is the given input(the inputs above, for example Paul Green)
You need to provide correct Regex for query condition
^(Paul Green|Paul Gree|Paul|paul|pau|Gr|pele|Pel|john|joh)
RegexPlayground
searched_player_name_string = "^(Paul Green|Paul Gree|Paul|paul|pau|Gr|pele|Pel|john|joh)"
result_cursor = db.players_collection.find({
"$or": [
{
"f": {
"$regex": searched_player_name_string,
"$options": "i"
}
},
{
"l": {
"$regex": searched_player_name_string,
"$options": "i"
}
},
{
"c": {
"$regex": searched_player_name_string,
"$options": "i"
}
}
]
})
searched_player_name_string = list(result_cursor)
MongoPlayground
Split your input in separate strings, run the query on each and append the results together (checking first it's not already found), Finally sort the results:
searched_player_name_string = 'Paul Green'
found_players = []
for regex in searched_player_name_string.split():
contain_searched_term_players = db.players_collection.find({'$or': [{'f': {'$regex': regex, '$options': 'i'}},
{'l': {'$regex': regex, '$options': 'i'}},
{'c': {'$regex': regex, '$options': 'i'}}]})
for player in contain_searched_term_players:
# The next line avoids creating duplicate answers if there are multiple matches for the same player
if player['_id'] not in [ o['_id'] for o in found_players ]:
found_players.append(player)
# Sort the output by "r" - highest first
pprint.pprint(sorted(found_players, key=lambda o: o['r'], reverse=True))
Response:
{
"service_name": "signup",
"message": "Sign Up has been done successfully",
"global_error": "",
"error": [],
"data": {
"session_key": "8f29d7c93e7089841208e94a7d98fc22",
"user_profile": {
"user_id": 65,
"user_unique_id": "e9a03a8ede",
"dob": "Dec 06, 1998",
"first_name": "FC7155313",
"last_name": "FC1791398",
"user_name": "FCwqim178",
"email": "fc_slekjbp#mailinator.com",
"phone_no": "3362239492",
"balance": "0",
"status": "2",
"image": "http://dummy.projects.com/app/assets/img/default_user.png",
"currency": "$",
"profile_status": 1,
"require_otp": false,
"existing_user": 0,
"master_country_id": null,
"master_state_id": "3919"
},
"verification_link": "http://dummy.projects.com/activation/ZTlhMDNhOGVkZV8xNTY3Mjk3MjAw",
"send_email_otp": false,
"send_phone_otp": false
},
"response_code": 200
}
I am using JMeter and want to pass "ZTlhMDNhOGVkZV8xNTY3Mjk3MjAw" value in next API.
This value "ZTlhMDNhOGVkZV8xNTY3Mjk3MjAw" is generating dynamic for each new user registration.
But my regex is not working.
The regular expression that I wrote:
,"verification_link":"http://dummy.projects.com/activation/(.+?)","send_email_otp":
Template: $1$
Match No.: 1
Your response seems to be a JSON entity therefore there is a high change that it looks like:
{
"service_name": "signup",
"message": "Sign Up has been done successfully",
"global_error": "",
"error": [
],
"data": {
"session_key": "8f29d7c93e7089841208e94a7d98fc22",
"user_profile": {
"user_id": 65,
"user_unique_id": "e9a03a8ede",
"dob": "Dec 06, 1998",
"first_name": "FC7155313",
"last_name": "FC1791398",
"user_name": "FCwqim178",
"email": "fc_slekjbp#mailinator.com",
"phone_no": "3362239492",
"balance": "0",
"status": "2",
"image": "http://dummy.projects.com/app/assets/img/default_user.png",
"currency": "$",
"profile_status": 1,
"require_otp": false,
"existing_user": 0,
"master_country_id": null,
"master_state_id": "3919"
},
"verification_link": "http://dummy.projects.com/activation/ZTlhMDNhOGVkZV8xNTY3Mjk3MjAw",
"send_email_otp": false,
"send_phone_otp": false
},
"response_code": 200
}
so this "send_email_otp" bit can easily go to the next line and your regular expression will not match anything in this situation.
I would recommend amending your regex to look something like:
"verification_link":\s?"http://dummy.projects.com/activation/(\w+)"
Demo:
References:
JMeter: Regular Expressions
Using RegEx (Regular Expression Extractor) with JMeter
Perl 5 Regex Cheat sheet
Your regex matches only the first part of your URL, and not the part that you actually want. Try this instead:
http:\/\/dummy\.projects\.com\/activation\/(.+)\",
Regex Demo
The (.+?) part matches between 1 and unlimited time, as few times as possible (called lazy match, indicated by +?). So you match the first character of whatever comes after activation/ and then stop. You do not get the entire value, as you want.
In the example snippet below I have some JSON which needs to be edited (over 1400 entries). I need to achieve 2 things:
In this example line: "phone": "+44 2079693900", I need to remove the whitespace between +44 and 2079693900 but for all records. Resulting in: "+442079693900"
For latitude and longitude I need to get rid of the double quotes around the numbers, as the API I am using only accepts these values as floats.
Example: "latitude": "51.51736", needs to be: "latitude": 51.51736
I am most familiar with Ruby, and have done some parsing of JSON with this in the past, but I thought Regex would be the best tool to use for this kind of basic data cleaning task. I have referred to regex101.com and regular-expressions.info but I'm pretty stuck at this point. Thanks in advance!
[
{
"id": "101756",
"name": "1 Lombard Street
"email": "reception#1lombardstreet.com",
"website": "http://www.1lombardstreet.com",
"location": {
"latitude": "51.5129",
"longitude": "-0.089",
"address": {
"line1": "1 Lombard Street",
"line2": "",
"line3": "",
"postcode": "EC3V 9AA",
"city": "London",
"country": "UK"
}
}
},
{
"id": "105371",
"name": "108 Brasserie",
"phone": "+44 2079693900",
"email": "enquiries#108marylebonelane.com",
"website": "http://www.108brasserie.com",
"location": {
"latitude": "51.51795",
"longitude": "-0.15079",
"address": {
"line1": "108 Marylebone Lane",
"line2": "",
"line3": "",
"postcode": "W1U 2QE",
"city": "London",
"country": "UK"
}
}
},
{
"id": "108701",
"name": "1901 Restaurant",
"phone": "+44 2076187000",
"email": "london.restres#andaz.com",
"website": "http://www.andazdining.com",
"location": {
"latitude": "51.51736",
"longitude": "-0.08123",
"address": {
"line1": "Andaz Hotel",
"line2": "40 Liverpool Street",
"line3": "",
"postcode": "EC2M 7QN",
"city": "London",
"country": "UK"
}
}
},
{
"id": "102190",
"name": "2 Bridge Place",
"phone": "+44 2078028555",
"email": "fb#dtlondonvictoria.com",
"website": "http://crimsonhotels.comdoubletreelondonvictoriadiningpre-theatre-dining",
"location": {
"latitude": "51.49396",
"longitude": "-0.14343",
"address": {
"line1": "2 Bridge Place",
"line2": "Victoria",
"line3": "",
"postcode": "SW1V 1QA",
"city": "London",
"country": "UK"
}
}
},
{
"id": "102063",
"name": "2 Veneti",
"phone": "+44 2076370789",
"email": "2veneti#btconnect.com",
"website": "http://www.2veneti.com",
"location": {
"latitude": "51.5168",
"longitude": "-0.14673",
"address": {
"line1": "10 Wigmore Street",
"line2": "",
"line3": "",
"postcode": "W1U 2RD",
"city": "London",
"country": "UK"
}
}
},
You can use the following regex:
("phone":\s*"\+44)\s+|("(?:latitude|longitude)":\s*)"([^"]+)"
With the following replacement:
$1$2$3
The idea is to capture what we want and not capture what we do not, and then use backreferences to restore the substrings we want to keep.
Regex explanation:
The pattern contains 2 alternatives joined with | alternation operator:
("phone":\s*"\+44)\s+:
("phone":\s*"\+44) - the 1st capturing group matching literal "phone": + optional whitespace, then +44 literally
\s+ - 1 or more whitespaces that we'll remove
("(?:latitude|longitude)":\s*)"([^"]+)":
("(?:latitude|longitude)":\s*) - the second capturing group matching "latitude": or "longitude": and 0 or more whitespace characters
" - Literal " that we'll drop
([^"]+) - the third capturing group matching 1 or more characters other than " (we'll keep that)
" - again, a literal " that we'll drop.
See demo
I am working with JSON data and response is coming very good
[
{
"s": "1",
"sent": "September, 11 2014 18:19:10 -0400",
"f": "user1",
"m": "the place",
"fr": "user2"
},
{
"s": "1",
"sent": "September, 11 2014 18:19:19 -0400",
"f": "user2",
"m": "that once decided",
"fr": "user1"
},
{
"s": "1",
"sent": "September, 11 2014 18:19:23 -0400",
"f": "user1",
"m": "on your side",
"fr": "user2"
},
{
"s": "1",
"sent": "September, 11 2014 18:19:43 -0400",
"f": "user2",
"m": "actually i moved the text",
"fr": "user1"
},
{
"s": "1",
"sent": "September, 11 2014 18:20:06 -0400",
"f": "user2",
"m": "nothing specific",
"fr": "user1"
}
]
I would like to remove the ending and starting [] from the JSON, currently I have to write two Replace to do it, but it works sometimes and sometimes it does not. What could be relevant regex to sort out this issue. please guide
regards
You can do it with one replace using a regex with an alternation (|) ^\s*\[|]\s*$. That says: Match any series of whitespace followed by [ from the beginning of the string (^\s*\[), or match ] followed by any series of whitespace at the end of the string ([\s*$) with nothing. Gratuitous online explanation of the regex. Use that with REReplace (for instance) with scope set to all and a blank string as the replacement.