Capture JSON property names with regex - regex

I want to create a regex that will capture property names in JSON objects. So I can color their property names. Then, in loop (in TypeScript), I will add a span with class to color captured matches.
For example:
I have an object that looks like this
{"restriction_data" : "ALL","old" : null,"new" : ["ALL"],"record_type" : "product"}
I want to get restriction_data, old, new, record_type from regex and make it red in color.
Other JSON that I can get is:
{"category":"category is mandatory","field":"[u'NONE'] is invalid field. Found: NONE","description":"description is mandatory"}
And same, I want to get category, field, description and make it red.
I tried \"(.*?)\" regex, but it doesn't quite work for me.

Related

Converting Powershell Array to Text so it can be exported to CSV or HTML

I'm trying to convert the output of a powershell (AWS Tools) command to strings so that I can export them to CSV or HTML. I for the life of me can't figure it out. I've seen comments on hashtables, naming elements, etc. Nothing seems to help me. (I'm very much a newbie).
This is what I got.
This command
(Get-IAMAccountAuthorizationDetail).UserDetailList | Select UserName, Grouplist
Will output this (with better spacing):
UserName GroupList
-------- ---------
User1 {Admins,Test}
User2 {Admins}
I cant' seem to figure out how to get this data so that it can be converted to CSV or HTML. Those brackets are an indication its an object, array or something. Can someone show me the code that would convert this to text or something that the Convertto-CVS o Convertto-HTML commands would work.
The output (subset) of the Get-Member Command is this:
TypeName : Amazon.IdentityManagement.Model.UserDetail
Name : Equals
MemberType : Method
Definition : bool Equals(System.Object obj)
TypeName : Amazon.IdentityManagement.Model.UserDetail
Name : GetHashCode
MemberType : Method
Definition : int GetHashCode()
TypeName : Amazon.IdentityManagement.Model.UserDetail
Name : GroupList
MemberType : Property
Definition : System.Collections.Generic.List[string] GroupList {get;set;}
Thanks
You could do something like the following, which will create a semi-colon delimited list within the GroupList cell:
(Get-IAMAccountAuthorizationDetail).UserDetailList |
Select-Object UserName,#{n='GroupList';e={$_.Grouplist -join ';'}}
Explanation:
The syntax #{n='Name';e={Expression}} is called a calculated property as explained at Select-Object. Here is some information about the calculated property:
It is a hash table with custom properties.
The first property is Name, which is a label for your expression output. n,Name,l, and label are all acceptable property names for that property.
The value passed to n is just a string that you are creating. It is the property name that will show up in your output, and it does not need to already exist in your object. Your actual property is called GroupList. As an example with n='All The Groups', the property name would becomeAll The Groups` in your output. There is nothing wrong with reusing the same name the property currently has.
The Expression or e is the ScriptBlock, which is why it is surrounded by {}. The ScriptBlock is responsible for producing the value in your custom property.
$_ is the current pipeline object passed into the ScriptBlock. This means if you have a collection (just like you do in your case), $_ will represent each of those items in order.
If you want to add another calculated property, just add a comma after the last and use the calculated property syntax like so:
Select-Object #{n='CustomProperty1';e={$_.ObjectProperty1}},#{n='CustomProperty2';e={$_.ObjectProperty2}}

Adding REGEX entities to SpaCy's Matcher

I am trying to add entities defined by regular expressions to SpaCy's NER pipeline. Ideally, I should be able to use any regular expression loaded from a json file with a defined entity type. As an example, I am trying to execute the code below.
The code below shows what I am trying to do, following an example given on Spacy's discussion about custom attributes using regular expressions. I have tried calling the 'set_extension' method in various ways (to Doc, Span, Token), but to no avail. I'm not even sure what I should be setting them to.
nlp = spacy.load("en_core_web_lg")
matcher = Matcher(nlp.vocab)
pattern = [{"_": {"country": {"REGEX": "^[Uu](\.?|nited) ?[Ss](\.|tates)$"}}}]
matcher.add("US", None, pattern)
doc = nlp(u"I'm from the United States.")
matches = matcher(doc)
for match_id, start, end in matches:
string_id = nlp.vocab.strings[match_id]
span = doc[start:end]
print(match_id, string_id, start, end, span.text)
I expect match_id, string_id 3 4 United States to be printed out.
Instead, I am getting AttributeError: [E046] Can't retrieve unregistered extension attribute 'country'. Did you forget to call the 'set_extension' method?
There's documentation around the extension attributes here: https://spacy.io/usage/processing-pipelines#custom-components-attributes
Basically you'll have to define this country variable as an extension attribute, something like this:
Token.set_extension("country", default="")
However, in the code you cited you're never actually setting the _.country attribute to any token (or span), so they're all still at default value, and the matcher will never be able to get a match on them. The line you cited:
pattern = [{"_": {"country": {"REGEX": "^[Uu](\.?|nited) ?[Ss](\.?|tates)$"}}}]
Tries to match the United States regex on the custom attribute values, instead of on the doc text, as you expect (I think).
One solution is just to run the reg-exps on the texts directly:
nlp = spacy.load("en_core_web_lg")
matcher = Matcher(nlp.vocab)
pattern = [{"TEXT": {"REGEX": "^[Uu](\.?|nited)$"}},
{"TEXT": {"REGEX": "^[Ss](\.?|tates)$"}}]
matcher.add("US", None, pattern)
doc = nlp(u"I'm from the United States.")
matches = matcher(doc)
for match_id, start, end in matches:
string_id = nlp.vocab.strings[match_id]
span = doc[start:end]
print(match_id, string_id, start, end, span.text)
Which outputs
15397641858402276818 US 4 6 United States
Then you can use those matches to e.g. set a custom attribute on the Span's or Token's (in this case Span, because your match is potentially involving multiple tokens)

Converting "Textarea" object from iPython wigdet to a list or iterable array

I have created several Textarea widgets in Jupyter/Python in order to capture some string inputs.
In the highlighted in yellow that you can see below, the idea is that the user puts a list of numbers here (copied from Excel) and later I need to convert this text into a list or an array that contains these numbers (an iterable object). I have no idea how to do this. See:
When I print the type of this object that is called "plus" I get this:
print(type(plus))
<class 'ipywidgets.widgets.widget_string.Textarea'>
But, I am expecting to have something like this:
plus = [454, 555]
Can I bounce some ideas off you to get this?
Thanks a lot!!!
If you have an ipywidget in general, you can observe its change and get its value as following.
foo = widgets.Textarea()
# to get the value
foo.value
# to do something on value change
def bar(change):
print(change.new)
foo.observe(bar, names=['value'])
You will then have to format the string you get from the products value, but that shouldn't be too difficult.
Hope this helps

Search for an item in a text file using UIMA Ruta

I have been trying to search for an item which is there in a text file.
The text file is like
Eg: `
>HEADING
00345
XYZ
MethodName : fdsafk
Date: 23-4-2012
More text and some part containing instances of XYZ`
So I did a dictionary search for XYZ initially and found the positions, but I want only the 1st XYZ and not the rest. There is a property of XYZ that , it will always be between the 5 digit code and the text MethondName .
I am unable to do that.
WORDLIST ZipList = 'Zipcode.txt';
DECLARE Zip;
Document
Document{-> MARKFAST(Zip, ZipList)};
DECLARE Method;
"MethodName" -> Method;
WORDLIST typelist = 'typelist.txt';
DECLARE type;
Document{-> MARKFAST(type, typelist)};
Also how do we use REGEX in UIMA RUTA?
There are many ways to specify this. Here are some examples (not tested):
// just remove the other annotations (assuming type is the one you want)
type{-> UNMARK(type)} ANY{-STARTSWITH(Method)};
// only keep the first one: remove any annotation if there is one somewhere in front of it
// you can also specify this with POSISTION or CURRENTCOUNT, but both are slow
type # #type{-> UNMARK(type)}
// just create a new annotation in between
NUM{REGEXP(".....")} #{-> type} #Method;
There are two options to use regex in UIMA Ruta:
(find) simple regex rules like "[A-Za-z]+" -> Type;
(matches) REGEXP conditions for validating the match of a rule element like
ANY{REGEXP("[A-Za-z]+")-> Type};
Let me know if something is not clear. I will extend the description then.
DISCLAIMER: I am a developer of UIMA Ruta

How to read semicolon separated certain values from a QString?

I am developing an application using Qt/KDE. While writing code for this, I need to read a QString that contains values like ( ; delimited)
<http://example.com/example.ext.torrent>; rel=describedby; type="application/x-bittorrent"; name="differentname.ext"
I need to read every attribute like rel, type and name into a different QString. The apporach I have taken so far is something like this
if (line.contains("describedby")) {
m_reltype = "describedby" ;
}
if (line.contains("duplicate")) {
m_reltype = "duplicate";
}
That is if I need to be bothered only by the presence of an attribute (and not its value) I am manually looking for the text and setting if the attribute is present. This approach however fails for attributes like "type" and name whose actual values need to be stored in a QString. Although I know this can be done by splitting the entire string at the delimiter ; and then searching for the attribute or its value, I wanted to know is there a cleaner and a more efficient way of doing it.
As I understand, the data is not always an URL.
So,
1: Split the string
2: For each substring, separate the identifier from the value:
id = str.mid(0,str.indexOf("="));
value = str.mid(str.indexOf("=")+1);
You can also use a RegExp:
regexp = "^([a-z]+)\s*=\s*(.*)$";
id = \1 of the regexp;
value = \2 of the regexp;
I need to read every attribute like rel, type and name into a different QString.
Is there a gurantee that this string will always be a URL?
I wanted to know is there a cleaner and a more efficient way of doing it.
Don't reinvent the wheel! You can use QURL::queryItems which would parse these query variables and return a map of name-value pairs.
However, make sure that your string is a well-formed URL (so that QURL does not reject it).