Non-Latin characters are underscores under RESTful Google Translate API v2 - google-cloud-platform

I'm trying to use Google's translate method from its Translation API as documented here, but for some reason the translations I get replace non-Latin characters with underscores.
For instance, with curl on the command-line:
$ curl -X POST 'https://translation.googleapis.com/language/translate/v2/?source=en&target=de&q=Practicing+diligently+each+day+means+inevitable+improvement.&key=MY_API_KEY'
{
"data": {
"translations": [
{
"translatedText": "T_glich flei_ig zu _ben, bedeutet unausweichliche Verbesserung."
}
]
}
}
Compare to the English-to-German result from translate.google.com:
Täglich fleißig zu üben, bedeutet unausweichliche Verbesserung.
It's especially bad when the target is a language like Japanese, which doesn't contain Latin characters:
$ curl -X POST 'https://translation.googleapis.com/language/translate/v2/?source=en&target=ja&q=Practicing+diligently+each+day+means+inevitable+improvement.&key=MY_API_KEY'
{
"data": {
"translations": [
{
"translatedText": "______________________________________________________"
}
]
}
}
Maybe this is a trial account limitation? Nothing I've seen in this docs would indicate this, however.

I believe it's a string-encoding issue.
I assume your HTTP request body is being sent using application/x-www-form-urlencoded - which does not support characters above 0x7F (128) as literal text, see here: application/x-www-form-urlencoded and charset="utf-8"?
I suggest:
POST with an explicit Content-Type: application/json header with the charset=utf-8 field set. (x-www-form-urlencoded does not support the charset field).
Ensure your terminal is using UTF-8
Also take a look using a tool like Wireshark, or create the request in JavaScript using fetch and use Chrome's Developer Tools' Network tab's "Copy as cURL (Bash)" command to get the terminal command to use.

Somewhat embarrassingly, this was actually just an issue with tmux, the terminal multiplexer I was using to read the output of every call I made to the Translation API, both with curl and with the printed output of the code I was writing.
As per this Ask Ubuntu answer to someone else's tmux question, this is fixable by explicitly telling tmux to launch with UTF-8 support, i.e., tmux -u.
Thanks both to Dai and Daniel for pointing to a potential terminal issue.

I just tried with the following request and it worked well:
curl -X POST "https://translation.googleapis.com/language/translate/v2?key=MY_API_KEY" \
-H "Content-Type: application/json" \
--data "{
'q': 'Practicing diligently each day means inevitable improvement.',
'source': 'en',
'target': 'de'
}"
Giving this output:
{
"data": {
"translations": [
{
"translatedText": "Täglich fleißig zu üben, bedeutet unausweichliche Verbesserung."
}
]
}
}
And for the Japanese output:
{
"data": {
"translations": [
{
"translatedText": "毎日熱心に練習することは避けられない改善を意味します。"
}
]
}
}
Hope it helps

Related

How to preserve line breaks in text content de-identified with Data Loss Prevention?

I am using an API call content.deidentify to de-identify text content. It is working as expected, but newline characters get stripped.
API call
curl -s \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://dlp.googleapis.com/v2/projects/$PROJECT_ID/content:deidentify \
-d #text-request.json \
> text-response.json
Input
Eleanor Rigby
Pharmacist
Liverpool Hospital
eleanor.rigby#example.com
Output
{
"item": {
"value": "Eleanor Rigby Pharmacist Liverpool Hospital [email-address]"
},
"overview": {
...
}
}
Is there any option I can add to the request to preserve the line breaks?
I found setPrettyPrint in the Java client documentation. Can I use this option when calling the API directly?
The issue had nothing to do with DLP.
I was sending invalid JSON:
{
"item": {
"value": "Eleanor Rigby
Pharmacist
Liverpool Hospital
eleanor.rigby#example.com"
},
"deidentifyConfig": {
...
}
}
Replacing the newline characters with \n solved the problem.

Google Cloud Speech API longrunningrecognize only returns name

I'm trying to convert over an hour audio data to text using Google Cloud Speech API, and I'm using API explorer since it's easy.
The request looks like this.
POST https://speech.googleapis.com/v1/speech:longrunningrecognize?key={YOUR_API_KEY}
{
"audio": {
"uri": "gs://data/audio.flac"
},
"config": {
"encoding": "FLAC",
"languageCode": "en-US"
}
}
The response look like this.
200
Show headers
{
"name": "`numbers`"
}
How come it is only returning the name, and not returning the text of the audio?
Just had the same problem.
Found the answer on https://cloud.google.com/speech/docs/async-recognize
If the request is successful, the server returns a 200 OK HTTP status code and the response in JSON format:
{
"name": "5543203840552489181"
}
where name is the name of the long running operation created for the request.
Wait approximately 30 seconds for processing to complete. To retrieve the result of the operation, make a GET request:
GET https://speech.googleapis.com/v1/operations/YOUR_OPERATION_NAME?key=YOUR_API_KEY
Got my results with:
curl -s -k -H "Content-Type: application/json" \
-H "Authorization: Bearer {access_token}" \
https://speech.googleapis.com/v1/operations/{name}

google speech api Invalid recognition

I am trying to follow the example on google speech api found here
https://cloud.google.com/speech/docs/getting-started
1) I created the follow json request file
{
'config': {
'encoding':'FLAC',
'sampleRate': 16000,
'languageCode': 'en-US'
},
'audio': {
'uri':'gs://cloud-samples-tests/speech/brooklyn.flac'
}
}
2) Authenticate to my service account
gcloud auth activate-service-account --key-file=service-account-key-file
3) Obtain my authorization token successfully
gcloud auth print-access-token
access_token
4) Then use the following curl command
curl -s -k -H "Content-Type: application/json" \
-H "Authorization: Bearer access_token" \
https://speech.googleapis.com/v1beta1/speech:syncrecognize \
-d #sync-request.json
But I keep getting the following response
{
"error": {
"code": 400,
"message": "Invalid recognition 'config': bad encoding..",
"status": "INVALID_ARGUMENT"
}
}
Do I need access permissions for the uri gs://cloud-samples-tests/speech/brooklyn.flac? Is that what the problem is?
Thanks in advance..
In my opinion, it is a file format issue.
You have to send WAV file instead of FLAC ...
[ FLAC and MP3 format are not supported <=> need a file conversion (representing cost) on the server side ]
Convert your audio file to WAV (using ffmpeg or avconv), then retry.
You may also take a look here (to see a working example)
For me, the solution was to remove the space between "-d #",
so change "-d #sync-request.json" to "-d#sync-request.json".
I got help here: https://groups.google.com/forum/#!topic/cloud-speech-discuss/bL_N5aJDG5A. Apparently the file was being read and processed, but the parms were going to the "curl.exe" instead of being passed to the URL.
I understand this is quite late for an answer. However, it might help others so putting in your error.
The config that you are passing is actually incorrect. The attributes should be like:
{
"config": {
"encoding": "LINEAR16",
"sampleRateHertz": 16000,
"languageCode": "en-US",
"maxAlternatives": 1,
"profanityFilter": true,
"enableWordTimeOffsets": false
},
"uri": {
"content":"<your uri>"
}
}

Is there any easy way / API to find out the number of pipelines on a gocd server?

Sorry for the brief question, but just wondering if there's an API to find out the number of pipelines on a GoCD server.
The Pipeline Groups API will give you what you need after some JSON parsing.
$ curl 'https://ci.example.com/go/api/config/pipeline_groups' \
-u 'username:password'
Returns:
HTTP/1.1 200 OK
Content-Type: application/json; charset=utf-8
[
{
"pipelines": [
{
"stages": [
{
"name": "up42_stage"
}
],
"name": "up42",
"materials": [
{
"description": "URL: https://github.com/gocd/gocd, Branch: master",
"fingerprint": "2d05446cd52a998fe3afd840fc2c46b7c7e421051f0209c7f619c95bedc28b88",
"type": "Git"
}
],
"label": "${COUNT}"
}
],
"name": "first"
}
]
You can grab the config.xml file and parse it. from the config repo or via http.
As an alternative, you can just get the cctray file from your server at http://yourgoserver/go/cctray.xml and parse it.
It contains information about all the pipelines (including its stages)
I would recommend using yagocd:
from yagocd import Yagocd
go = Yagocd(server='https://build.gocd.io')
# login as guest
go._session.get('https://build.gocd.io/go/plugin/interact/gocd.guest.user.auth.plugin/index')
print(len(list(go.pipelines)))
Yes, of course. You can get the desired output in different ways. The first easy way to get the number of pipelines and other statistical information from the GoCD support URL (https://example.com/go/api/support) which requires admin privilege.
If the user does not have the admin privilege, we need to go with the GoCD pipeline_groups API. The below command should give you the exact result with jq(JSON processor)
$ curl 'https://example.com/go/api/config/pipeline_groups' -u 'username:password' | jq -r '.[] | .pipelines[].name' | wc -l
NOTE: Still Go Administrator users can get the actual number of pipelines.

django rest framework post - Invalid drive specification

I'm trying to post an XML to my API and although it works fine from the URL if I try to CURL the XML file in I get an "Invalid drive specification" error.
This is my CURL command -
curl -X POST -d 5022_4qa.xml http://servername:9001/deploy/calendar/&format=xml
As soon as I try the curl I get back a few errors before it fails. My assumption is that it's not grabbing the XML file for some reason. I can even put the full path of the file the error is the same.
....
</div>{
"evntmst_type": [
"This field is required."
],
"evntmst_id": [
"This field is required."
],
"evntmst_name": [
"This field is required."
]
}</pre>
....
Invalid drive specification
In the return on the API side it's returning a 400 code.
To post the contents of a file with curl, you need to prefix the file name with #. So:
curl -X POST -d #5022_4qa.xml http...