Extract Four-Character Matches from Strings

Extract Four-Character Matches from Strings - regex

I need to extract the following four-character matches from the below string:
Sites:
KPAE
KPSC
KPUW
KRNT
KSEA
KSFF
KSHN
KTIW
Data:
{"sids": ["24222 1", "452670 2", "PAE 3", "KPAE 5", "USW00024222 6"], "name": "EVERETT SNOHOMISH AP"},
{"sids": ["24163 1", "PSC 3", "KPSC 5", "USW00024163 6"], "name": "PASCO TRI CITIES AP"},
{"sids": ["94129 1", "PUW 3", "KPUW 5", "USW00094129 6"], "name": "PULLMAN MOSCOW RGNL AP"},
{"sids": ["94248 1", "RNT 3", "KRNT 5", "USW00094248 6"], "name": "RENTON MUNI AP"},
{"sids": ["24233 1", "457473 2", "SEA 3", "72793 4", "KSEA 5", "USW00024233 6", "SEA 7"], "name": "SEATTLE TACOMA INTL AP"},
{"sids": ["94176 1", "SFF 3", "KSFF 5", "USW00094176 6"], "name": "SPOKANE FELTS FLD"},
{"sids": ["94227 1", "457585 2", "SHN 3", "KSHN 5", "USW00094227 6", "SHN 7"], "name": "SHELTON SANDERSON FLD"},
{"sids": ["94274 1", "TIW 3", "KTIW 5", "USW00094274 6"], "name": "TACOMA NARROWS AP"},
I have tried to extract these matches from the strings but, the position of them can change from string to string...
Attempted Code:
awk -F',' '{print $5}'

Using grep -oP:
grep -Po '"\K[A-Z]{4}\b' file
KPAE
KPSC
KPUW
KRNT
KSEA
KSFF
KSHN
KTIW

Related

Querying data on S3 Object

Trying to query data on json file using S3-Select
{
"groups_id":
{
"307225":
{
"created_at": "2015-02-10T17:24:15-08:00",
"updated_at": "2017-09-06T17:25:22-07:00",
"name": "Company 1",
"company": true,
"contact_name": "User 1",
"email": "",
"phone_number": "",
"address": "",
"website": "",
"notes": "Testing",
"id": "307225"
},
"1058565":
{
"created_at": "2017-04-02T23:44:10-07:00",
"updated_at": "2017-07-18T17:39:21-07:00",
"name": "Company 3",
"company": true,
"contact_name": "User 1",
"email": "",
"phone_number": "",
"address": "",
"website": "",
"notes": null,
"id": "1058565"
}
}
}
Can someone help to get the desired output using s3 select, based on the condition as WHERE contact_name='User 1'.
{"id": "307225", "name": "Company 1"},
{"id": "1058565","name": "Company 3"}
below are the queries we had tried,
Select s.groups_id['307225'].id from s3object s
Select s.groups_id['1058565'].id from s3object s
in the above queries, we had hardcoded the group_id and we are able to fetch below
{
"id": "1058565"
}
but in our case groupid is dynamic so i am not understanding. how to handle that?

Get all instances of text in curly braces between brackets

Let's say I have some text like this:
{
"source": "Analytics 13 {Employee_Info.acl} {Employee_Data}",
"lastRecNo": "3",
"columns": {
"ID": "numeric",
"NAME": "character",
"EFFECTIVE_DATE": "date",
"ROLE": "character"
},
"data": [{
"ID": 1,
"NAME": "Bill Smith",
"EFFECTIVE_DATE": "2018-10-01",
"ROLE": "Director {Regional},{Call Center}"
},
{
"ID": 2,
"NAME": "Ellen Jones",
"EFFECTIVE_DATE": "2018-07-01",
"ROLE": "Manager"
},
{
"ID": 3,
"NAME": "Sam Edwards",
"EFFECTIVE_DATE": "2018-09-01",
"ROLE": "Supervisor"
}]
}
And I want to RegEx match every object inside the "data" array (including the curly braces).
So the first match would be:
{
"ID": 1,
"NAME": "Bill Smith",
"EFFECTIVE_DATE": "2018-10-01",
"ROLE": "Director {Regional},{Call Center}"
}
the second would be:
{
"ID": 2,
"NAME": "Ellen Jones",
"EFFECTIVE_DATE": "2018-07-01",
"ROLE": "Manager"
}
and the third would be
{
"ID": 3,
"NAME": "Sam Edwards",
"EFFECTIVE_DATE": "2018-09-01",
"ROLE": "Supervisor"
}
What regex pattern would I use to do that in PowerShell?
Notice the first match actually has some extra curly braces in the text of the "ROLE" field, which shouldn't interfere with the match.
I've tried this so far '(?<={).*?(?=})', but the first match is:
"source": "Analytics 13 {Employee_Info.acl
This result isn't a part of the "data" array and it doesn't include the curly braces in the match. I know I'm missing something that says "make sure we are inside the brackets/"data" array and I'm probably not taking into account the extra curly braces in the "ROLE" field in the first object of the "data" array that I want to ignore.

Your task can be easily done using ConvertFrom-Json and ConvertTo-Json cmdlets.
Here is a brief example:
First you get text file content to variable.
$JSON = #"
[
{
"source": "Analytics 13 {Employee_Info.acl} {Employee_Data}",
"lastRecNo": "3",
"columns": {
"ID": "numeric",
"NAME": "character",
"EFFECTIVE_DATE": "date",
"ROLE": "character"
},
"data": [{
"ID": 1,
"NAME": "Bill Smith",
"EFFECTIVE_DATE": "2018-10-01",
"ROLE": "Director {Regional},{Call Center}"
},
{
"ID": 2,
"NAME": "Ellen Jones",
"EFFECTIVE_DATE": "2018-07-01",
"ROLE": "Manager"
},
{
"ID": 3,
"NAME": "Sam Edwards",
"EFFECTIVE_DATE": "2018-09-01",
"ROLE": "Supervisor"
}]
}
]
"#
Then you just perform converting from JSON using ConvertFrom-Json cmdlet.
ConvertFrom-Json -InputObject $JSON
Output:
source lastRecNo columns data
------ --------- ------- ----
Analytics 13 {Employee_Info.acl} {Employee_Data} 3 #{ID=numeric; NAME=character; EFFECTIVE_DATE=date; ROLE=character} {#{ID=1; NAME=Bill Smith; EFFECTIVE_DATE=2018-10-01; ROLE=Director {Regional},{Call Center}}, #...
You then can return items from DATA to JSON format using ConvertTo-Json cmdlet. All together.
$PSObject = ConvertFrom-Json -InputObject $JSON
foreach ($item in $PSObject.data){
ConvertTo-Json $item
}
Output:
{
"ID": 1,
"NAME": "Bill Smith",
"EFFECTIVE_DATE": "2018-10-01",
"ROLE": "Director {Regional},{Call Center}"
}
{
"ID": 2,
"NAME": "Ellen Jones",
"EFFECTIVE_DATE": "2018-07-01",
"ROLE": "Manager"
}
{
"ID": 3,
"NAME": "Sam Edwards",
"EFFECTIVE_DATE": "2018-09-01",
"ROLE": "Supervisor"
}
You can now add filter conditions for DATA items in foreach loop.

list index out of range: How can I deal with list index error

products = [
{"id":1, "name": "Chocolate Sandwich Cookies", "department": "snacks", "aisle": "cookies cakes", "price": 3.50},
{"id":2, "name": "All-Seasons Salt", "department": "pantry", "aisle": "spices seasonings", "price": 4.99},
{"id":3, "name": "Robust Golden Unsweetened Oolong Tea", "department": "beverages", "aisle": "tea", "price": 2.49},
{"id":4, "name": "Smart Ones Classic Favorites Mini Rigatoni With Vodka Cream Sauce", "department": "frozen", "aisle": "frozen meals", "price": 6.99},
{"id":5, "name": "Green Chile Anytime Sauce", "department": "pantry", "aisle": "marinades meat preparation", "price": 7.99},
{"id":6, "name": "Dry Nose Oil", "department": "personal care", "aisle": "cold flu allergy", "price": 21.99},
{"id":7, "name": "Pure Coconut Water With Orange", "department": "beverages", "aisle": "juice nectars", "price": 3.50},
{"id":8, "name": "Cut Russet Potatoes Steam N' Mash", "department": "frozen", "aisle": "frozen produce", "price": 4.25},
{"id":9, "name": "Light Strawberry Blueberry Yogurt", "department": "dairy eggs", "aisle": "yogurt", "price": 6.50},
{"id":10, "name": "Sparkling Orange Juice & Prickly Pear Beverage", "department": "beverages", "aisle": "water seltzer sparkling water", "price": 2.99},
{"id":11, "name": "Peach Mango Juice", "department": "beverages", "aisle": "refrigerated", "price": 1.99},
{"id":12, "name": "Chocolate Fudge Layer Cake", "department": "frozen", "aisle": "frozen dessert", "price": 18.50},
{"id":13, "name": "Saline Nasal Mist", "department": "personal care", "aisle": "cold flu allergy", "price": 16.00},
{"id":14, "name": "Fresh Scent Dishwasher Cleaner", "department": "household", "aisle": "dish detergents", "price": 4.99},
{"id":15, "name": "Overnight Diapers Size 6", "department": "babies", "aisle": "diapers wipes", "price": 25.50},
{"id":16, "name": "Mint Chocolate Flavored Syrup", "department": "snacks", "aisle": "ice cream toppings", "price": 4.50},
{"id":17, "name": "Rendered Duck Fat", "department": "meat seafood", "aisle": "poultry counter", "price": 9.99},
{"id":18, "name": "Pizza for One Suprema Frozen Pizza", "department": "frozen", "aisle": "frozen pizza", "price": 12.50},
{"id":19, "name": "Gluten Free Quinoa Three Cheese & Mushroom Blend", "department": "dry goods pasta", "aisle": "grains rice dried goods", "price": 3.99},
{"id":20, "name": "Pomegranate Cranberry & Aloe Vera Enrich Drink", "department": "beverages", "aisle": "juice nectars", "price": 4.25}
]
product_ids = []
while True:
product_id = input ("Hey please input a product identifier: ")
if product_id == "done":
break
else:
product_ids.append(product_id)
def lookup_product_by_id(product_id):
matching_products = [product for product in products if product["id"] == product_id]
return matching_products[0]
raw_total = 0
for product_id in product_ids:
product = lookup_product_by_id(product_id)
raw_total += product["price"]
print(str(product["id"]) + " " + product["name"] + " " + str(product["price"]))
It says ' list index out of range ' I don't know what is wrong with me. How can I solve this problem? I don't know how 'return matching_products[0]' and 'product = lookup_product_by_id(product_id)' are interracted.

loopback.io multiple includes

I have a loopback.io API with a MongoDB database.
I have a model called Project with 2 hasMany relationships
Project has many ProjectArticle objects
Project has many ProjectWorkHours objects
When I run I do a request: http://0.0.0.0:3000/api/Projects?filter[include][projectArticles]
I get the following response, which is good
{
"title": "Project 1",
"description": "My project number one",
"dateCreated": "2015-10-31T00:00:00.000Z",
"id": "5634b3af340faf570c7e70a8",
"projectArticles": [
{
"articleName": "brick",
"quantity": 5,
"unitPrice": 2,
"id": "5634b9ea5ab833960c8fbf6d",
"projectId": "5634b3af340faf570c7e70a8"
}
]
}
and when I do: http://0.0.0.0:3000/api/Projects?filter[include][workHours] I get the right answer:
{
"title": "Project 1",
"description": "My project number one",
"dateCreated": "2015-10-31T00:00:00.000Z",
"id": "5634b3af340faf570c7e70a8",
"workHours": [
{
"description": "blabla",
"startDate": "2015-10-31T00:00:00.000Z",
"endDate": "2015-10-31T00:00:00.000Z",
"id": "5634e11d5f6471f10d2e0dd7",
"workHourId": "5634b3af340faf570c7e70a8"
},
{
"description": "blabla 2",
"startDate": "2015-10-31T00:00:00.000Z",
"endDate": "2015-10-31T00:00:00.000Z",
"id": "5634e1265f6471f10d2e0dd8",
"workHourId": "5634b3af340faf570c7e70a8"
}
]
}
but how do i combine the two includes in one request? so I have the following result:
{
"title": "Project 1",
"description": "My project number one",
"dateCreated": "2015-10-31T00:00:00.000Z",
"id": "5634b3af340faf570c7e70a8",
"workHours": [
{
"description": "blabla",
"startDate": "2015-10-31T00:00:00.000Z",
"endDate": "2015-10-31T00:00:00.000Z",
"id": "5634e11d5f6471f10d2e0dd7",
"workHourId": "5634b3af340faf570c7e70a8"
},
{
"description": "blabla 2",
"startDate": "2015-10-31T00:00:00.000Z",
"endDate": "2015-10-31T00:00:00.000Z",
"id": "5634e1265f6471f10d2e0dd8",
"workHourId": "5634b3af340faf570c7e70a8"
}
],
"projectArticles": [
{
"articleName": "brick",
"quantity": 5,
"unitPrice": 2,
"id": "5634b9ea5ab833960c8fbf6d",
"projectId": "5634b3af340faf570c7e70a8"
}
]
}
I tried this: http://0.0.0.0:3000/api/Projects?filter[include][workHours]&[include][projectArticles] but it only applies the first one and the second filter is kind of ignored
any ideas?

I've found the syntax for doing it
http://0.0.0.0:3000/api/Projects?filter={"include":["workHours", "projectArticles"]}

Remove whitespace after +44 and quotation marks around lat/long in JSON string with regex

In the example snippet below I have some JSON which needs to be edited (over 1400 entries). I need to achieve 2 things:
In this example line: "phone": "+44 2079693900", I need to remove the whitespace between +44 and 2079693900 but for all records. Resulting in: "+442079693900"
For latitude and longitude I need to get rid of the double quotes around the numbers, as the API I am using only accepts these values as floats.
Example: "latitude": "51.51736", needs to be: "latitude": 51.51736
I am most familiar with Ruby, and have done some parsing of JSON with this in the past, but I thought Regex would be the best tool to use for this kind of basic data cleaning task. I have referred to regex101.com and regular-expressions.info but I'm pretty stuck at this point. Thanks in advance!
[
{
"id": "101756",
"name": "1 Lombard Street
"email": "reception#1lombardstreet.com",
"website": "http://www.1lombardstreet.com",
"location": {
"latitude": "51.5129",
"longitude": "-0.089",
"address": {
"line1": "1 Lombard Street",
"line2": "",
"line3": "",
"postcode": "EC3V 9AA",
"city": "London",
"country": "UK"
}
}
},
{
"id": "105371",
"name": "108 Brasserie",
"phone": "+44 2079693900",
"email": "enquiries#108marylebonelane.com",
"website": "http://www.108brasserie.com",
"location": {
"latitude": "51.51795",
"longitude": "-0.15079",
"address": {
"line1": "108 Marylebone Lane",
"line2": "",
"line3": "",
"postcode": "W1U 2QE",
"city": "London",
"country": "UK"
}
}
},
{
"id": "108701",
"name": "1901 Restaurant",
"phone": "+44 2076187000",
"email": "london.restres#andaz.com",
"website": "http://www.andazdining.com",
"location": {
"latitude": "51.51736",
"longitude": "-0.08123",
"address": {
"line1": "Andaz Hotel",
"line2": "40 Liverpool Street",
"line3": "",
"postcode": "EC2M 7QN",
"city": "London",
"country": "UK"
}
}
},
{
"id": "102190",
"name": "2 Bridge Place",
"phone": "+44 2078028555",
"email": "fb#dtlondonvictoria.com",
"website": "http://crimsonhotels.comdoubletreelondonvictoriadiningpre-theatre-dining",
"location": {
"latitude": "51.49396",
"longitude": "-0.14343",
"address": {
"line1": "2 Bridge Place",
"line2": "Victoria",
"line3": "",
"postcode": "SW1V 1QA",
"city": "London",
"country": "UK"
}
}
},
{
"id": "102063",
"name": "2 Veneti",
"phone": "+44 2076370789",
"email": "2veneti#btconnect.com",
"website": "http://www.2veneti.com",
"location": {
"latitude": "51.5168",
"longitude": "-0.14673",
"address": {
"line1": "10 Wigmore Street",
"line2": "",
"line3": "",
"postcode": "W1U 2RD",
"city": "London",
"country": "UK"
}
}
},

You can use the following regex:
("phone":\s*"\+44)\s+|("(?:latitude|longitude)":\s*)"([^"]+)"
With the following replacement:
$1$2$3
The idea is to capture what we want and not capture what we do not, and then use backreferences to restore the substrings we want to keep.
Regex explanation:
The pattern contains 2 alternatives joined with | alternation operator:
("phone":\s*"\+44)\s+:
("phone":\s*"\+44) - the 1st capturing group matching literal "phone": + optional whitespace, then +44 literally
\s+ - 1 or more whitespaces that we'll remove
("(?:latitude|longitude)":\s*)"([^"]+)":
("(?:latitude|longitude)":\s*) - the second capturing group matching "latitude": or "longitude": and 0 or more whitespace characters
" - Literal " that we'll drop
([^"]+) - the third capturing group matching 1 or more characters other than " (we'll keep that)
" - again, a literal " that we'll drop.
See demo

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Extract Four-Character Matches from Strings - regex

Using grep -oP: grep -Po '"\K[A-Z]{4}\b' file KPAE KPSC KPUW KRNT KSEA KSFF KSHN KTIW

Related

Querying data on S3 Object

Get all instances of text in curly braces between brackets

list index out of range: How can I deal with list index error

loopback.io multiple includes

Remove whitespace after +44 and quotation marks around lat/long in JSON string with regex

Categories

Resources