Working with json in Watson Python SDK - python-2.7

I am working on a project that will hopefully allow me to combine the Watson Python SDK implementation of speech-to-text...Watson Conversation...and text-to-speech. I am running into some problems though working with the Python 2.7 json data. I am actually trying to do two things:
1) I want to parse the json data just for the transcript values and it would be awesome if I could combine those values into an easily readable string format for use later in the program.
2) The other thing I need to do is manipulate the json in a way that would allow me to use it as input for the conversation or text-to-speech sections. Basically, how can I convert whats provided in the json into acceptable input for the other Watson modules?
What I've tried so far:
I read the Python 2.7 json docs and tried to convert it back into a Python dictionary which sort of worked? All of the key:value pairs had a "u" before them and none of the regular dictionary methods seemed to work on them. Also, they don't look like the standard Key:Value combinations. I was able to put all of the json data in one variable though. I'll post my code below (ignore the print statements as I was just checking to see how the data looked at each step), but it's mostly just what you can get from the github examples section.
** Just a quick final question too: Is the Python SDK limited in any way compared to the other ones (Java, JScript, etc) because it seems like their output is much easier to work with?
import json
from os.path import join, dirname
from watson_developer_cloud import SpeechToTextV1
speech_to_text = SpeechToTextV1(
username='###########',
password='###########,
x_watson_learning_opt_out=True
)
with open(join(dirname(__file__), '/home/user/Desktop/output.wav'),'rb') as audio_file:
json_str = (json.dumps(speech_to_text.recognize(audio_file, content_type='audio/wav', timestamps=False, word_confidence=False,
model='en-US_NarrowbandModel'), indent=2))
print json_str
json_dict = json.loads(json_str)
print json_dict
def main(args):
return 0
if __name__ == '__main__':
import sys
sys.exit(main(sys.argv))

The issue appears to me that you are dumping your JSON to a string, then trying to access it as an object.
Using the following sample code, it works.
from os.path import join, dirname
from watson_developer_cloud import SpeechToTextV1
speech_to_text = SpeechToTextV1(
username='....',
password='....',
x_watson_learning_opt_out=True
)
with open('../blog/ihaveadream.wav','rb') as audio_file:
response = speech_to_text.recognize(audio_file, content_type='audio/wav', timestamps=False, word_confidence=False, model='en-US_NarrowbandModel')
print json.dumps(response, indent=2)
This returns the following:
{
"results": [
{
"alternatives": [
{
"confidence": 1.0,
"transcript": "I still have a dream "
}
],
"final": true
},
{
"alternatives": [
{
"confidence": 0.999,
"transcript": "it is a dream deeply rooted in the American dream I have a dream "
}
],
"final": true
},
{
"alternatives": [
{
"confidence": 1.0,
"transcript": "that one day this nation will rise up and live out the true meaning of its creed we hold these truths to be self evident that all men are created equal "
}
],
"final": true
}
],
"result_index": 0,
"warnings": [
"Unknown arguments: continuous."
]
}
So if you wanted to access the top level response you can do the following.
print 'Confidence: {}'.format(response['results'][0]['alternatives'][0]['confidence'])
print 'Transcript: {}'.format(response['results'][0]['alternatives'][0]['transcript'])
The output of that would be:
Confidence: 1.0
Transcript: I still have a dream

Related

Regex Statement in VSCode snippet for removing file extension

I'd like to create a VS-Code snippet for importing css into a react component. If I'm using the snippet in "MyComponent.tsx", then I'd like the snippet to import the associated css file for the component:
import "./MyComponent.css";
The component and it's css will always be located in the same directory.
I thought that the following snippet would be able to do this:
//typescriptreact.json
"import componet css": {
"prefix": "icss2",
"body": [
"import \"./${1:$TM_FILENAME/^(.+)(\.[^ .]+)?$/}.css\";"
],
"description": ""
},
But this results in:
import "./MyComponent.tsx/^(.+)([^ .]+)?$/.css";
What's the correct way to do this?
You can use
"import componet css": {
"prefix": "icss2",
"body": [
"import \"./${TM_FILENAME_BASE/^(.*)\\..*/$1/}.css\";"
],
"description": ""
}
The ${TM_FILENAME_BASE} variable holds the file name without the path, and the ^(.*)\\..* regex matches and captures all up to the last . while just matching the extension, and only the captured part remains due to the $1 replacement pattern (that refers to Group 1 value).
"import component css": {
"prefix": "icss2",
"body": [
"import \"./${TM_FILENAME_BASE}.css\";"
],
"description": ""
}
TM_FILENAME_BASE The filename of the current document without its
extensions
from snippet variables documentation.
So there is no need to remove the .tsx extension via a transform - it is already done for you.
The more interesting question is what if you have a file like
myComponent.next.tsx // what should the final result be?
${TM_FILENAME_BASE} will only take off the final .tsx resulting in import "./myComponent.next.css";
#Wiktor's results in import "./myComponent.css";
Which is correct in your case? Is something like myComponent.next.tsx a possible case for you? If not just use ${TM_FILENAME_BASE} with no need for a transform.

How to change the experiment file path generated when running Ray's run_experiments()?

I'm using the following spec on my code to generate experiments:
experiment_spec = {
"test_experiment": {
"run": "PPO",
"env": "MultiTradingEnv-v1",
"stop": {
"timesteps_total": 1e6
},
"checkpoint_freq": 100,
"checkpoint_at_end": True,
"local_dir": '~/Documents/experiment/',
"config": {
"lr_schedule": grid_search(LEARNING_RATE_SCHEDULE),
"num_workers": 3,
'observation_filter': 'MeanStdFilter',
'vf_share_layers': True,
"env_config": {
},
}
}
}
ray.init()
run_experiments(experiments=experiment_spec)
Note that I use grid_search to try various learning rates. The problem is "lr_schedule" is defined as:
LEARNING_RATE_SCHEDULE = [
[
[0, 7e-5], # [timestep, lr]
[1e6, 7e-6],
],
[
[0, 6e-5],
[1e6, 6e-6],
]
]
So when the experiment checkpoint is generated it has a lot of [ in it's path name, making the path unreadable to the interpreter. Like this:
~/Documents/experiment/PPO_MultiTradingEnv-v1_0_lr_schedule=[[0, 7e-05], [3500000.0, 7e-06]]_2019-08-14_20-10-100qrtxrjm/checkpoint_40
The logic solution is to manually rename it but I discovered that its name is referenced in other files like experiment_state.json, so the best solution is to set a custom experiment path and name.
I didn't find anything in documentation.
This is my project if it helps
Can someone help?
Thanks in advance
You can set custom trial names - https://ray.readthedocs.io/en/latest/tune-usage.html#custom-trial-names. Let me know if that works for you.

How to find plurals with Google Cloud Natural Language API

The Google Cloud Natural Language API can be used to analyse text and return a syntactic parse tree with each word labeled with parts-of-speech tags.
Is there a way to deturmine if a noun is plural or not?
If Google Cloud NL is able to work out the lemma then perhaps the information is there but not returned through the API?
Update
With the NL API's GA launch, the annotateText endpoint now returns a number key for each token indicating whether word is singular, plural, or dual. For the sentence "There are some cats here," the API returns the following token data for 'cats' (notice that number is PLURAL):
{
"text": {
"content": "cats",
"beginOffset": -1
},
"partOfSpeech": {
"tag": "NOUN",
"aspect": "ASPECT_UNKNOWN",
"case": "CASE_UNKNOWN",
"form": "FORM_UNKNOWN",
"gender": "GENDER_UNKNOWN",
"mood": "MOOD_UNKNOWN",
"number": "PLURAL",
"person": "PERSON_UNKNOWN",
"proper": "PROPER_UNKNOWN",
"reciprocity": "RECIPROCITY_UNKNOWN",
"tense": "TENSE_UNKNOWN",
"voice": "VOICE_UNKNOWN"
},
"dependencyEdge": {
"headTokenIndex": 1,
"label": "DOBJ"
},
"lemma": "cat"
}
See the full documentation here.
Thanks for trying out the NL API.
Right now there isn't a clean way to detect plurals other than to note that the base word is different than the lemma and guess whether it's plural (in English, perhaps it ends in an -s).
However, we plan to release a much better way of detecting morphological information like plurality, so stay tuned.

Can't find strings that aren't words in Django Haystick/Elasticsearch

I'm using Django Haystack with Elasticsearch as the backend for a real-time flight mapping service.
I have all my search indexes set up correctly, however, I'm having trouble returning results for searches that aren't full words (such as aviation callsigns, some of which take the style N346IF, others include full words such as in Speedbird 500). The N346IF style of query doesn't yield any results, whereas I can easily return results for the latter example.
I make my query as below:
queryResults = SearchQuerySet().filter(content=q) # where q is the query in string format
(note that in the past I used the AutoQuery queryset, but the documentation lists that this only tracks words, so I'm passing a raw string now).
I have my search index fields setup as EdgeNgramField with search templates.
I have a custom backend with the following index settings (as well as both the snowball analyzer and the pattern analyzer):
ELASTICSEARCH_INDEX_SETTINGS = {
'settings': {
"analysis": {
"analyzer": {
"ngram_analyzer": {
"type": "custom",
"tokenizer": "lowercase",
"filter": ["haystack_ngram"]
},
"edgengram_analyzer": {
"type": "custom",
"tokenizer": "lowercase",
"filter": ["haystack_edgengram"]
}
},
"tokenizer": {
"haystack_ngram_tokenizer": {
"type": "nGram",
"min_gram": 4,
"max_gram": 15,
},
"haystack_edgengram_tokenizer": {
"type": "edgeNGram",
"min_gram": 4,
"max_gram": 15,
"side": "front"
}
},
"filter": {
"haystack_ngram": {
"type": "nGram",
"min_gram": 4,
"max_gram": 15
},
"haystack_edgengram": {
"type": "edgeNGram",
"min_gram": 4,
"max_gram": 15
}
}
}
}
}
ELASTICSEARCH_DEFAULT_ANALYZER = "pattern"
My backend is configured as:
class ConfigurableElasticBackend(ElasticsearchSearchBackend):
def __init__(self, connection_alias, **connection_options):
super(ConfigurableElasticBackend, self).__init__(
connection_alias, **connection_options)
user_settings = getattr(settings, 'ELASTICSEARCH_INDEX_SETTINGS')
if user_settings:
setattr(self, 'DEFAULT_SETTINGS', user_settings)
class ConfigurableElasticBackend(ElasticsearchSearchBackend):
DEFAULT_ANALYZER = "pattern"
def __init__(self, connection_alias, **connection_options):
super(ConfigurableElasticBackend, self).__init__(
connection_alias, **connection_options)
user_settings = getattr(settings, 'ELASTICSEARCH_INDEX_SETTINGS')
user_analyzer = getattr(settings, 'ELASTICSEARCH_DEFAULT_ANALYZER')
if user_settings:
setattr(self, 'DEFAULT_SETTINGS', user_settings)
if user_analyzer:
setattr(self, 'DEFAULT_ANALYZER', user_analyzer)
def build_schema(self, fields):
content_field_name, mapping = super(ConfigurableElasticBackend,
self).build_schema(fields)
for field_name, field_class in fields.items():
field_mapping = mapping[field_class.index_fieldname]
if field_mapping['type'] == 'string' and field_class.indexed:
if not hasattr(field_class, 'facet_for') and not \
field_class.field_type in('ngram', 'edge_ngram'):
field_mapping['analyzer'] = self.DEFAULT_ANALYZER
mapping.update({field_class.index_fieldname: field_mapping})
return (content_field_name, mapping)
class ConfigurableElasticSearchEngine(ElasticsearchSearchEngine):
backend = ConfigurableElasticBackend
What would be the correct setup in order to successfully yield results for search patterns that are both and/or N346IF-style strings?
Appreciate any input, apologies if this is similar to another question (could not find anything related to it).
edit: requested by solarissmoke, the schema for this model:
class FlightIndex(indexes.SearchIndex, indexes.Indexable):
text = indexes.EdgeNgramField(document=True, use_template=True)
flight = indexes.CharField(model_attr='flightID')
callsign = indexes.CharField(model_attr='callsign')
displayName = indexes.CharField(model_attr='displayName')
session = indexes.CharField(model_attr='session')
def prepare_session(self, obj):
return obj.session.serverId
def get_model(self):
return Flight
Text is indexed as:
flight___{{ object.callsign }}___{{ object.displayName }}
It doesn't fully explain the behaviour you are seeing, but I think the problem is with how you are indexing your data - specifically the text field (which is what gets searched when you filter on content).
Take the example data you provided, callsign N133TC, flight name Shahrul Nizam. The text document for this data becomes:
flight___N133TC___Shahrul Nizam
You have set this field as an EdgeNgramField (min 4 chars, max 15). Here are the ngrams that are generated when this document is indexed (I've ignored the lowercase filter for simplicity):
flig
fligh
flight
flight_
flight___
flight___N
flight___N1
flight___N13
flight___N133
flight___N133T
flight___N133TC
Niza
Nizam
Note that the tokenizer does not split on underscores. Now, if you search for N133TC, none of the above tokens will match. (I can't explain why Shahrul works... it shouldn't, unless I've missed something, or there are spaces at the start of that field).
If you changed your text document to:
flight N133TC Shahrul Nizam
Then the indexed tokens would be:
flig
flight
N133
N133T
N133TC
Shah
Shahr
Shahru
Shahrul
Niza
Nizam
Now, a search for N133TC should match.
Note also that the flight___ string in your document generates a whole load of (most likely) useless tokens - unless this is deliberate you may be better off without it.
Solving my own question - appreciate the input by solarissmoke as it has helped me track down what was causing this.
My answer is based on Greg Baker's answer on the question
ElasticSearch: EdgeNgrams and Numbers
The issue appears to be related to the use of numeric values within the search text (in my case, the N133TC pattern). Note that I was using the snowball analyzer at first, before switching to pattern - none of these worked.
I adjusted my analyzer setting in settings.py:
"edgengram_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": ["haystack_edgengram"]
}
Thus changing the tokenizer value to standard from the original lowercase analyzer used.
I then set the default analyzer to be used in my backend to the edgengram_analyzer (also on settings.py):
ELASTICSEARCH_DEFAULT_ANALYZER = "edgengram_analyzer"
This does the trick! It still works as an EdgeNgram field should, but allows for my numeric values to be returned properly too.
I've also followed the advice in the answer by solarissmoke and removed all the underscores from my index files.

Adding Targets to Target Lists using REST API with SugarCRM

I'm trying to add targets to target lists in Sugar via REST service calls. I'm getting a positive response from Sugar but records are not added. The service method I'm using is *set_relationship*:
{
"session":"3ece4lmn5rtweq9vm5581jht",
"module_name":"ProspectLists",
"module_id":"cb13b96f-8334-733c-1548-52c27a5b8b99",
"link_field_name":"prospects",
"name_value_list":[],
"related_ids":["534f894a-4265-143d-c94b-52be908685b1"],
"delete":0
}
I also tried it the other way around:
{
"session":"3ece4lmn5rtweq9vm5581jht",
"module_name":"Prospects",
"module_id":"cb13b96f-8334-733c-1548-52c27a5b8b99",
"link_field_name":"prospect_lists",
"name_value_list":[],
"related_ids":["534f894a-4265-143d-c94b-52be908685b1"],
"delete":0
}
In both cases I get a promising response:
{"created":1,"failed":0,"deleted":0}
...but when I check the target list I can't find any added targets. I also checked the database but there is no trace either.
My Sugar Version is 6.5.16 CE and I'm using the SuiteCRM 7.0.1 extension but I don't think this makes a difference here.
Any hint is highly appreciated. Thanks!
I finally figured it out. It seems like set_relationship is very picky about the parameter order. The parameter naming doesn't even mean a thing. This worked in the end for me:
{
"session":"3ece4lmn5rtweq9vm5581jht",
"module_name":"Prospects",
"module_id":"cb13b96f-8334-733c-1548-52c27a5b8b99",
"link_field_name":"prospect_lists",
"related_ids":["534f894a-4265-143d-c94b-52be908685b1"],
"delete":0
}
Working Python code (API v4.1):
import sugarcrm
import json
import requests
crm_session = sugarcrm.Session(CRM_HOST, CRM_USER, CRM_PASS)
payload = {
"method": "set_relationship",
"input_type": "JSON",
"response_type": "JSON",
"rest_data": json.dumps({
"session": crm_session.session_id,
"module_name": "Prospects",
# ID of the record you're creating relationship FROM
# In my case it is a record from module "Prospects"
"module_id": "cb13b96f-8334-733c-1548-52c27a5b8b99",
"link_field_name": "events_prospects",
# ID of the record you're creating relationship FOR
# In my case it is a record from module "events"
"related_ids": ["534f894a-4265-143d-c94b-52be908685b1"],
"name_value_list": [],
"delete": 0
})
}
result = requests.post(CRM_HOST, data=payload)
#Till is right, be careful with the order of "rest_data" parameters. In my case placing name_value_list before related_ids has been producing positive results with no actual relationship created.
p.s. I'm using this library: https://pypi.python.org/pypi/sugarcrm/0.1