I am working with JSON data and response is coming very good
[
{
"s": "1",
"sent": "September, 11 2014 18:19:10 -0400",
"f": "user1",
"m": "the place",
"fr": "user2"
},
{
"s": "1",
"sent": "September, 11 2014 18:19:19 -0400",
"f": "user2",
"m": "that once decided",
"fr": "user1"
},
{
"s": "1",
"sent": "September, 11 2014 18:19:23 -0400",
"f": "user1",
"m": "on your side",
"fr": "user2"
},
{
"s": "1",
"sent": "September, 11 2014 18:19:43 -0400",
"f": "user2",
"m": "actually i moved the text",
"fr": "user1"
},
{
"s": "1",
"sent": "September, 11 2014 18:20:06 -0400",
"f": "user2",
"m": "nothing specific",
"fr": "user1"
}
]
I would like to remove the ending and starting [] from the JSON, currently I have to write two Replace to do it, but it works sometimes and sometimes it does not. What could be relevant regex to sort out this issue. please guide
regards
You can do it with one replace using a regex with an alternation (|) ^\s*\[|]\s*$. That says: Match any series of whitespace followed by [ from the beginning of the string (^\s*\[), or match ] followed by any series of whitespace at the end of the string ([\s*$) with nothing. Gratuitous online explanation of the regex. Use that with REReplace (for instance) with scope set to all and a blank string as the replacement.
Related
the output is made of json with the whole text and each speaker segments like so:
"speaker_labels": {
"speakers": 2,
"segments": [
{
"start_time": "0.94",
"speaker_label": "spk_0",
"end_time": "3.065",
"items": [
{
"start_time": "1.01",
"speaker_label": "spk_0",
"end_time": "1.22"
},
.
.
.
and then the each word and its timestamp
"items": [
{
"start_time": "1.01",
"end_time": "1.22",
"alternatives": [{ "confidence": "1.0", "content": "word" }],
"type": "pronunciation"
},
{
"start_time": "1.22",
"end_time": "1.81",
"alternatives": [{ "confidence": "1.0", "content": "word" }],
"type": "pronunciation"
},
{
"alternatives": [{ "confidence": "0.0", "content": "another word" }],
"type": "punctuation"
},
.
.
.
and i need each speaker's words how am i suppose to get that data without doing alot of logic with the start end time of the words and all the start end of the speakers.
the result i expect:
spk_0 : words words words
spk_1 : words words more words
I need to find all the results that start with certain input for example for the inputs: "Paul", "pau", "paul Gr", "Paul Green", "Paul Gree" , "Pel", "pele", "joh","john" etc.. The search has to be case insensive..
it suppose to return all of these(the input search string is at least 3 characters long):
[
{
"_id": ObjectId("5e6ffe413f71835ae3aa4b60"),
"f": "Paul",
"id": 11811,
"l": "Pelè",
"r": 64
},
{
"_id": ObjectId("5e6ffe413f71835ae3aa4b65"),
"f": "paul",
"id": 11811,
"l": "walker",
"r": 64
},
{
"_id": ObjectId("5e6ffe413f71835ae3aa4b66"),
"f": "johnny",
"id": 11811,
"l": "Green",
"r": 64
}
]
tried to do the following:
contain_searched_term_players = list(db.players_collection.find({'$or': [{'f': {'$regex': searched_player_name_string, '$options': 'i'}},
{'l': {'$regex': searched_player_name_string, '$options': 'i'}},
{'c': {'$regex': searched_player_name_string, '$options': 'i'}}]}).sort([{'r', -1}])
but it doesnt work for "Paul Green"
searched_player_name_string is the given input(the inputs above, for example Paul Green)
You need to provide correct Regex for query condition
^(Paul Green|Paul Gree|Paul|paul|pau|Gr|pele|Pel|john|joh)
RegexPlayground
searched_player_name_string = "^(Paul Green|Paul Gree|Paul|paul|pau|Gr|pele|Pel|john|joh)"
result_cursor = db.players_collection.find({
"$or": [
{
"f": {
"$regex": searched_player_name_string,
"$options": "i"
}
},
{
"l": {
"$regex": searched_player_name_string,
"$options": "i"
}
},
{
"c": {
"$regex": searched_player_name_string,
"$options": "i"
}
}
]
})
searched_player_name_string = list(result_cursor)
MongoPlayground
Split your input in separate strings, run the query on each and append the results together (checking first it's not already found), Finally sort the results:
searched_player_name_string = 'Paul Green'
found_players = []
for regex in searched_player_name_string.split():
contain_searched_term_players = db.players_collection.find({'$or': [{'f': {'$regex': regex, '$options': 'i'}},
{'l': {'$regex': regex, '$options': 'i'}},
{'c': {'$regex': regex, '$options': 'i'}}]})
for player in contain_searched_term_players:
# The next line avoids creating duplicate answers if there are multiple matches for the same player
if player['_id'] not in [ o['_id'] for o in found_players ]:
found_players.append(player)
# Sort the output by "r" - highest first
pprint.pprint(sorted(found_players, key=lambda o: o['r'], reverse=True))
Response:
{
"service_name": "signup",
"message": "Sign Up has been done successfully",
"global_error": "",
"error": [],
"data": {
"session_key": "8f29d7c93e7089841208e94a7d98fc22",
"user_profile": {
"user_id": 65,
"user_unique_id": "e9a03a8ede",
"dob": "Dec 06, 1998",
"first_name": "FC7155313",
"last_name": "FC1791398",
"user_name": "FCwqim178",
"email": "fc_slekjbp#mailinator.com",
"phone_no": "3362239492",
"balance": "0",
"status": "2",
"image": "http://dummy.projects.com/app/assets/img/default_user.png",
"currency": "$",
"profile_status": 1,
"require_otp": false,
"existing_user": 0,
"master_country_id": null,
"master_state_id": "3919"
},
"verification_link": "http://dummy.projects.com/activation/ZTlhMDNhOGVkZV8xNTY3Mjk3MjAw",
"send_email_otp": false,
"send_phone_otp": false
},
"response_code": 200
}
I am using JMeter and want to pass "ZTlhMDNhOGVkZV8xNTY3Mjk3MjAw" value in next API.
This value "ZTlhMDNhOGVkZV8xNTY3Mjk3MjAw" is generating dynamic for each new user registration.
But my regex is not working.
The regular expression that I wrote:
,"verification_link":"http://dummy.projects.com/activation/(.+?)","send_email_otp":
Template: $1$
Match No.: 1
Your response seems to be a JSON entity therefore there is a high change that it looks like:
{
"service_name": "signup",
"message": "Sign Up has been done successfully",
"global_error": "",
"error": [
],
"data": {
"session_key": "8f29d7c93e7089841208e94a7d98fc22",
"user_profile": {
"user_id": 65,
"user_unique_id": "e9a03a8ede",
"dob": "Dec 06, 1998",
"first_name": "FC7155313",
"last_name": "FC1791398",
"user_name": "FCwqim178",
"email": "fc_slekjbp#mailinator.com",
"phone_no": "3362239492",
"balance": "0",
"status": "2",
"image": "http://dummy.projects.com/app/assets/img/default_user.png",
"currency": "$",
"profile_status": 1,
"require_otp": false,
"existing_user": 0,
"master_country_id": null,
"master_state_id": "3919"
},
"verification_link": "http://dummy.projects.com/activation/ZTlhMDNhOGVkZV8xNTY3Mjk3MjAw",
"send_email_otp": false,
"send_phone_otp": false
},
"response_code": 200
}
so this "send_email_otp" bit can easily go to the next line and your regular expression will not match anything in this situation.
I would recommend amending your regex to look something like:
"verification_link":\s?"http://dummy.projects.com/activation/(\w+)"
Demo:
References:
JMeter: Regular Expressions
Using RegEx (Regular Expression Extractor) with JMeter
Perl 5 Regex Cheat sheet
Your regex matches only the first part of your URL, and not the part that you actually want. Try this instead:
http:\/\/dummy\.projects\.com\/activation\/(.+)\",
Regex Demo
The (.+?) part matches between 1 and unlimited time, as few times as possible (called lazy match, indicated by +?). So you match the first character of whatever comes after activation/ and then stop. You do not get the entire value, as you want.
Let's say I have some text like this:
{
"source": "Analytics 13 {Employee_Info.acl} {Employee_Data}",
"lastRecNo": "3",
"columns": {
"ID": "numeric",
"NAME": "character",
"EFFECTIVE_DATE": "date",
"ROLE": "character"
},
"data": [{
"ID": 1,
"NAME": "Bill Smith",
"EFFECTIVE_DATE": "2018-10-01",
"ROLE": "Director {Regional},{Call Center}"
},
{
"ID": 2,
"NAME": "Ellen Jones",
"EFFECTIVE_DATE": "2018-07-01",
"ROLE": "Manager"
},
{
"ID": 3,
"NAME": "Sam Edwards",
"EFFECTIVE_DATE": "2018-09-01",
"ROLE": "Supervisor"
}]
}
And I want to RegEx match every object inside the "data" array (including the curly braces).
So the first match would be:
{
"ID": 1,
"NAME": "Bill Smith",
"EFFECTIVE_DATE": "2018-10-01",
"ROLE": "Director {Regional},{Call Center}"
}
the second would be:
{
"ID": 2,
"NAME": "Ellen Jones",
"EFFECTIVE_DATE": "2018-07-01",
"ROLE": "Manager"
}
and the third would be
{
"ID": 3,
"NAME": "Sam Edwards",
"EFFECTIVE_DATE": "2018-09-01",
"ROLE": "Supervisor"
}
What regex pattern would I use to do that in PowerShell?
Notice the first match actually has some extra curly braces in the text of the "ROLE" field, which shouldn't interfere with the match.
I've tried this so far '(?<={).*?(?=})', but the first match is:
"source": "Analytics 13 {Employee_Info.acl
This result isn't a part of the "data" array and it doesn't include the curly braces in the match. I know I'm missing something that says "make sure we are inside the brackets/"data" array and I'm probably not taking into account the extra curly braces in the "ROLE" field in the first object of the "data" array that I want to ignore.
Your task can be easily done using ConvertFrom-Json and ConvertTo-Json cmdlets.
Here is a brief example:
First you get text file content to variable.
$JSON = #"
[
{
"source": "Analytics 13 {Employee_Info.acl} {Employee_Data}",
"lastRecNo": "3",
"columns": {
"ID": "numeric",
"NAME": "character",
"EFFECTIVE_DATE": "date",
"ROLE": "character"
},
"data": [{
"ID": 1,
"NAME": "Bill Smith",
"EFFECTIVE_DATE": "2018-10-01",
"ROLE": "Director {Regional},{Call Center}"
},
{
"ID": 2,
"NAME": "Ellen Jones",
"EFFECTIVE_DATE": "2018-07-01",
"ROLE": "Manager"
},
{
"ID": 3,
"NAME": "Sam Edwards",
"EFFECTIVE_DATE": "2018-09-01",
"ROLE": "Supervisor"
}]
}
]
"#
Then you just perform converting from JSON using ConvertFrom-Json cmdlet.
ConvertFrom-Json -InputObject $JSON
Output:
source lastRecNo columns data
------ --------- ------- ----
Analytics 13 {Employee_Info.acl} {Employee_Data} 3 #{ID=numeric; NAME=character; EFFECTIVE_DATE=date; ROLE=character} {#{ID=1; NAME=Bill Smith; EFFECTIVE_DATE=2018-10-01; ROLE=Director {Regional},{Call Center}}, #...
You then can return items from DATA to JSON format using ConvertTo-Json cmdlet. All together.
$PSObject = ConvertFrom-Json -InputObject $JSON
foreach ($item in $PSObject.data){
ConvertTo-Json $item
}
Output:
{
"ID": 1,
"NAME": "Bill Smith",
"EFFECTIVE_DATE": "2018-10-01",
"ROLE": "Director {Regional},{Call Center}"
}
{
"ID": 2,
"NAME": "Ellen Jones",
"EFFECTIVE_DATE": "2018-07-01",
"ROLE": "Manager"
}
{
"ID": 3,
"NAME": "Sam Edwards",
"EFFECTIVE_DATE": "2018-09-01",
"ROLE": "Supervisor"
}
You can now add filter conditions for DATA items in foreach loop.
In the example snippet below I have some JSON which needs to be edited (over 1400 entries). I need to achieve 2 things:
In this example line: "phone": "+44 2079693900", I need to remove the whitespace between +44 and 2079693900 but for all records. Resulting in: "+442079693900"
For latitude and longitude I need to get rid of the double quotes around the numbers, as the API I am using only accepts these values as floats.
Example: "latitude": "51.51736", needs to be: "latitude": 51.51736
I am most familiar with Ruby, and have done some parsing of JSON with this in the past, but I thought Regex would be the best tool to use for this kind of basic data cleaning task. I have referred to regex101.com and regular-expressions.info but I'm pretty stuck at this point. Thanks in advance!
[
{
"id": "101756",
"name": "1 Lombard Street
"email": "reception#1lombardstreet.com",
"website": "http://www.1lombardstreet.com",
"location": {
"latitude": "51.5129",
"longitude": "-0.089",
"address": {
"line1": "1 Lombard Street",
"line2": "",
"line3": "",
"postcode": "EC3V 9AA",
"city": "London",
"country": "UK"
}
}
},
{
"id": "105371",
"name": "108 Brasserie",
"phone": "+44 2079693900",
"email": "enquiries#108marylebonelane.com",
"website": "http://www.108brasserie.com",
"location": {
"latitude": "51.51795",
"longitude": "-0.15079",
"address": {
"line1": "108 Marylebone Lane",
"line2": "",
"line3": "",
"postcode": "W1U 2QE",
"city": "London",
"country": "UK"
}
}
},
{
"id": "108701",
"name": "1901 Restaurant",
"phone": "+44 2076187000",
"email": "london.restres#andaz.com",
"website": "http://www.andazdining.com",
"location": {
"latitude": "51.51736",
"longitude": "-0.08123",
"address": {
"line1": "Andaz Hotel",
"line2": "40 Liverpool Street",
"line3": "",
"postcode": "EC2M 7QN",
"city": "London",
"country": "UK"
}
}
},
{
"id": "102190",
"name": "2 Bridge Place",
"phone": "+44 2078028555",
"email": "fb#dtlondonvictoria.com",
"website": "http://crimsonhotels.comdoubletreelondonvictoriadiningpre-theatre-dining",
"location": {
"latitude": "51.49396",
"longitude": "-0.14343",
"address": {
"line1": "2 Bridge Place",
"line2": "Victoria",
"line3": "",
"postcode": "SW1V 1QA",
"city": "London",
"country": "UK"
}
}
},
{
"id": "102063",
"name": "2 Veneti",
"phone": "+44 2076370789",
"email": "2veneti#btconnect.com",
"website": "http://www.2veneti.com",
"location": {
"latitude": "51.5168",
"longitude": "-0.14673",
"address": {
"line1": "10 Wigmore Street",
"line2": "",
"line3": "",
"postcode": "W1U 2RD",
"city": "London",
"country": "UK"
}
}
},
You can use the following regex:
("phone":\s*"\+44)\s+|("(?:latitude|longitude)":\s*)"([^"]+)"
With the following replacement:
$1$2$3
The idea is to capture what we want and not capture what we do not, and then use backreferences to restore the substrings we want to keep.
Regex explanation:
The pattern contains 2 alternatives joined with | alternation operator:
("phone":\s*"\+44)\s+:
("phone":\s*"\+44) - the 1st capturing group matching literal "phone": + optional whitespace, then +44 literally
\s+ - 1 or more whitespaces that we'll remove
("(?:latitude|longitude)":\s*)"([^"]+)":
("(?:latitude|longitude)":\s*) - the second capturing group matching "latitude": or "longitude": and 0 or more whitespace characters
" - Literal " that we'll drop
([^"]+) - the third capturing group matching 1 or more characters other than " (we'll keep that)
" - again, a literal " that we'll drop.
See demo