BASH: Regex -> empty result - regex

I'm using bash but I do not get a bash rematch... Every online regex check tool worked fine for this string and regex.
#!/bin/bash
set -x
regex='hd_profile_pic_url_info": {"url": "([0-9a-zA-Z._:\/\-_]*)"'
str='{"user": {"pk": 12345, "username": "dummy", "full_name": "dummy", "is_private": true, "profile_pic_url": "censored", "profile_pic_id": "censored", "is_verified": false, "has_anonymous_profile_picture": false, "media_count": 0, "follower_count": 71114, "following_count": 11111, "biography": "", "external_url": "", "usertags_count": 0, "hd_profile_pic_versions": [{"width": 320, "height": 320, "url": "censored"}, {"width": 640, "height": 640, "url": "censored"}], "hd_profile_pic_url_info": {"url": "https://scontent-frt3-2.cdninstagram.com/vp/censored/censored_a.jpg", "width": 930, "height": 930}, "has_highlight_reels": false, "auto_expand_chaining": false}, "status": "ok"}'
[[ $str =~ $regex ]] && echo ${BASH_REMATCH}

Parsing json with bash it's not a good idea, as others said, jq is the right tool for the job.
Having said that, I think
regex='hd_profile_pic_url_info": {"url": "[0-9a-zA-Z._:\/_-]*"'
would work. Notice the '-' as the last char in the set, to avoid being interpreted as a range.

You have to remove the duplicate _ at the end of your regex :
regex='"hd_profile_pic_url_info": {"url": "([0-9a-zA-Z._:\/\-]*)"'

Related

Problems with Image Label Adjustment Job in Amazon Sagemaker Ground Truth

I'm trying to create a Image Label Adjustment Job in Ground Truth and I'm having some trouble. The thing is that I have a dataset of images, in which there are pre-made bounding boxes. I have an external python script that creates the "dataset.manifest" file with the json's of each image. Here are the first four lines of that manifest file:
{"source-ref": "s3://automatic-defect-detection/LM-WNB1-M-0000126254-camera_2_0022.jpg", "bounding-box": {"image_size": [{"width": 2048, "height": 1536, "depth": 3}], "annotations": [{"class_id": 0, "width": 80, "height": 80, "top": 747, "left": 840}]}, "bounding-box-metadata": {"class-map": {"0": "KK"}, "type": "groundtruth/object-detection", "human-annotated": "yes"}}
{"source-ref": "s3://automatic-defect-detection/LM-WNB1-M-0000126259-camera_2_0028.jpg", "bounding-box": {"image_size": [{"width": 2048, "height": 1536, "depth": 3}], "annotations": [{"class_id": 0, "width": 80, "height": 80, "top": 1359, "left": 527}]}, "bounding-box-metadata": {"class-map": {"0": "KK"}, "type": "groundtruth/object-detection", "human-annotated": "yes"}}
{"source-ref": "s3://automatic-defect-detection/LM-WNB1-M-0000126256-camera_3_0006.jpg", "bounding-box": {"image_size": [{"width": 2048, "height": 1536, "depth": 3}], "annotations": [{"class_id": 3, "width": 80, "height": 80, "top": 322, "left": 1154}, {"class_id": 3, "width": 80, "height": 80, "top": 633, "left": 968}]}, "bounding-box-metadata": {"class-map": {"3": "FF"}, "type": "groundtruth/object-detection", "human-annotated": "yes"}}
{"source-ref": "s3://automatic-defect-detection/LM-WNB1-M-0000126253-camera_2_0019.jpg", "bounding-box": {"image_size": [{"width": 2048, "height": 1536, "depth": 3}], "annotations": [{"class_id": 2, "width": 80, "height": 80, "top": 428, "left": 1058}]}, "bounding-box-metadata": {"class-map": {"2": "DD"}, "type": "groundtruth/object-detection", "human-annotated": "yes"}}
Now the problem is that I'm creating private jobs in Amazon Sagemaker to try it out. I have the manifest file and the images in a S3 bucket, and it actually kinda works. So I select the input manifest, activate the "Existing-labels display options". The existing labels for the bounding boxes do not appear automatically, so I have to enter them manually (don't know why), but if I do that and try the preview before creating the adjustment job, the bounding boxes appear perfectly and I can adjust them. The thing is that, me being the only worker invited for the job, the job never apears to start working on it, and it just auto-completes. I can see later that the images are there with my pre-made bounding boxes, but the job never appears to adjust those boxes. I don't have the "Automated data labeling" option activated. Is there something missing in my manifest file?
There can be multiple reasons for this. First of all, the automated labeling option is not support for label adjustment and verification tasks. so thats ruled out.
It looks like you have not setup the adjustment job properly. Some things to check for:
Have you specified the Task timeout and Task expiration time? If these values are practically low, then the tasks would be expired even before somebody can pick them.
Have you checked the "I want to display existing labels from the dataset for this job." box? It should be checked for your case.
Does your existing label are fetched properly? If this is not fetched correctly, either you need to review your manifest file or you need to manually provide the label values(which i guess you are doing)
Since you are the only worker in the workforce. Do you have correct permissions to access the labeling task?
How many images you have? Have you set any minimum batch size while setting the label adjustment job?

How to use powerShell and regular expressions to parse a text file

I am new to powerShell and need to input a text file, parse it to extract the data we need and write the result to a .csv file. However, at this point I still am unable to parse the file and am totally confused about which PS commands to use and how to incorporate regular expressions. While I could write out all of the ways I've tried to get this to work I think it would be more beneficial to just ask for help and then ask questions on anything I don't fully understand. FYI: we're running Win10 and my only 2 scripting options are batch or PowerShell.
We have a JSON file that was formatted by notepad++ and looks like this:
"issue": [{
"field": [{
"name": "someName",
"value": [],
"values": []
}],
"field": [{
"name": "numberinproject",
"value": ["81"],
"values": ["81"]
}],
"field": [{
"name": "summary",
"value": ["This is a summary for 81."],
"values": ["This is a summary for 81."]
}],
"comment":[{
"text": "someText for 81 - 01",
"markdown":false,
"created":0123456789101,
"updated":null,
"Author":"first.last01",
"permitted group":null
},{
"text": "someText for 81 - 02",
"markdown":false,
"created":0123456789102,
"updated":null,
"Author":"first.last02",
"permitted group":null
},{
"text": "someText for 81 - 03",
"markdown":false,
"created":0123456789103,
"updated":null,
"Author":"first.last03",
"permitted group":null
}],
"field": [{
"name": "someNameTwo",
"value": [],
"values": []
}],
"field": [{
"name": "numberinproject",
"value": ["83"],
"values": ["83"]
}],
"field": [{
"name": "summary",
"value": ["This is a summary for 83."],
"values": ["This is a summary for 83."]
}],
"comment":[]
}
]
What I am attempting to do is extract the numberinproject, summary and Comment text, created and Author.
Notice that there could be Zero to multiple comments per project number. The comment.created field is a 13 digit epoch number that has to be converted into mm/dd/yyyy hh:mm:ss AM/PM
I had hoped to export this data into a .csv file but at this time would be happy just getting the data parsed out of the file.
Thanks for whatever feedback you can give.
===================================================
By request: Here are some of the things I tried and I apologise for this being such a mess. Since the "json" file was not in a format that convertfrom-json could use I assumed the file was actually text and that is where this starts.
What I've picked up has been from Searching on the web. If anyone can suggest a good article, please let me know and I will read it.
Set-Variable -Name "inputFile" -Value "inputFile.txt"
Set-Variable -Name "outputTXTFile" -Value "outputTXTFile.txt"
Set-Variable -Name "outputFile" -Value "outputFile.csv"
numberinProject = \"value\"\:\s\[\"\d+
summary = \"value\"\:\s\[\".+\"\],
comment - text = \"text\"\:\s\".+\",
comment - created = \"created\"\:\d{13}
comment - author = \"Author\"\:\"\w+\.\w+
## This actually worked. Though it grabbed the whole line, my plan was to then parse it a for a substring.
$results = Get-Content -Path $inputFile | Select-String -Pattern '"values": ' -CaseSensitive -SimpleMatch
# ------------------------------------------------
# However, If I tried using regex, the parse failed
$results = Get-Content -Path $inputFile | Select-String -Pattern \"values\"\:\s\[\"\d+ -CaseSensitive -SimpleMatch
# I also tried this
#$A = Get-ChildItem $inputFile | Select-String -Pattern '(<ID>\"value\"\:\s\[\"\d'
# $results | Export-CSV $outputFile -NoTypeInformation
$results | Out-File $outputTXTFile
# ---------------------------------------------------------
#I tried to output the file as a single string for manipulation - it didn't work
Get-Content -Path $inputFile) -join "`r`n" | Out-File $outputTXTFile
# I tried to use "patterns" to find the data but that didn't work
$issueIDPattern = "(<ID>\"value\"\:\s\[\"\d+)"
$summaryPattern = "\"value\"\:\s\[\".+\"\],"
$commentTextPattern = "\"text\"\:\s\".+\","
$commentDatePattern = "\"created\"\:\d{13}"
$commentAuthorPattern = "\"Author\"\:\"\w+\.\w+
Get-ChildItem $inputFile|
Select-String -Pattern $issueIDPattern |
Foreach-Object {
$ID = $_.Matches[0].Groups['ID'].Value
[PSCustomObject] #{
issueNum = $ID
}
}
### Also tried a variation of this
Get-Content C:\Path\To\File.txt) -join "`r`n" -Split "(?m)^(?=\S)" |
Where{$_} |
ForEach{
Clear-Variable commentauthor,commentcreated,commenttext,summary,numberinProject
$commentcreated = #()
$numberinProject = ($_ -split "`r`n")[0].trim()
Switch -regex ($_ -split "`r`n"){
"^\s+summary:" {$summary = ($_ -split ':',2)[-1].trim();Continue}
"^\s+.:\\" {$commentcreated += $_.trim();continue}
"^\s+commenttext" {$commenttext = [RegEx]::Matches($_,"(?<=commenttext installed from )(.+?)(?= \[)").value;continue}
}
[PSCustomObject]#{'numberinProject' = $numberinProject;'summary' = $summary; 'commenttext' = $commenttext; 'commentcreated' = $commentcreated}
}

awk (or sed/grep) to get occurrences of substring

I have a json string in a bash variable, which is something like this:
{
"items": [
{
"foo": null,
"timestamp": 1553703000,
"bar": 123
},
{
"foo": null,
"timestamp": 1553703200,
"bar": 456
},
{
"foo": null,
"timestamp": 1553703400,
"bar": 789
}
]
}
I want to know how many of those timestamps are after a given datetime, so if I have 1553703100 it'll return 2.
(Bonus imaginary points if you can get me just that number!)
As a step towards that, I want to get just the matches of "timestamp": \d+, in the string so that I can loop through them in a bash script.
I've used sed and grep a bit, but never used awk, and from my reading it seems like that might be the better match for the task.
Other info:
- The json is already pretty-printed, as above, so the timestamps would always be on separate lines.
- This is to run in Cygwin, so I have awk/gawk, sed, and grep/egrep, but probably not others.
- Could be any number of timestamps in the json.
You didn't provide the expected output so it's a guess but is this what you're trying to do?
$ echo "$var" | jq '.items[].timestamp'
1553703000
1553703200
1553703400
or maybe:
$ echo "$var" | jq '.items[].timestamp | select(. > 1553703100)'
1553703200
1553703400
or:
$ echo "$var" | jq '[.items[].timestamp | select(. > 1553703100)] | length'
2
WARNING: I'm just learning jq so there may be better ways to do the above!
edit: The second approach listed below has serious problems that were very helpfully outlined by #EdMorton. I've elected to keep the old code for educational purposes.
Avoided substr() and caught null string i:
$ awk -v dt=1553703100 '
/timestamp/ && $2+0>dt {i++}
END {print i+0}
' <<< "$var"
2
WARNING: PROBLEMATIC CODE
Here I used substr(string, index, [characters]) to trim the comma off your second field. The /timestamp/ regex is not complex; it could be improved if your json became more intricate.
$ awk -v dt=1553703100 '
/timestamp/ && substr($2, 0, length($2)) > dt {i++}
END {print i}
' <<< "$var"
2
You can also implement quickly a python solution:
input:
$ cat data.json
{
"items": [
{
"foo": null,
"timestamp": 1553703000,
"bar": 123
},
{
"foo": null,
"timestamp": 1553703200,
"bar": 456
},
{
"foo": null,
"timestamp": 1553703400,
"bar": 789
}
]
}
code:
$ cat extract_value2.py
import json
tLimit = 1553703100
with open('data.json') as f:
data = json.load(f)
print([t['timestamp'] for t in data["items"] if t['timestamp'] > tLimit])
output:
$ python extract_value2.py
[1553703200, 1553703400]
count code:
$ cat extract_value2.py
import json
tLimit = 1553703100
with open('data.json') as f:
data = json.load(f)
print(len([t['timestamp'] for t in data["items"] if t['timestamp'] > tLimit]))
output:
$ python extract_value2.py
2

Python 2.7.9 subprocess convert check_output to dictionary (volumio)

I've been searching a long time, but I can't find an answer.
I'm making a script for volumio on my Raspberry Pi
In the terminal, when I type
volumio status
I get exactly
{
"status": "pause",
"position": 0,
"title": "Boom Boom",
"artist": "France Gall",
"album": "Francegall Longbox",
"albumart": "/albumart?cacheid=614&web=France%20Gall/Francegall%20Longbox/extralarge&path=%2FUSB&metadata=false",
"uri": "/Boom Boom.flac",
"trackType": "flac",
"seek": 21192,
"duration": 138,
"samplerate": "44.1 KHz",
"bitdepth": "16 bit",
"channels": 2,
"random": true,
"repeat": null,
"repeatSingle": false,
"consume": false,
"volume": 100,
"mute": false,
"stream": "flac",
"updatedb": false,
"volatile": false,
"service": "mpd"
}
In python, I would like to store this in a dictionary
since it already has the right formatting, I thought that assigning it to a variable will make it a dictionnary right away as follows:
import subprocess, shlex
cmd = "volumio status | sed -e 's/true/True/g' -e 's/false/False/g' -e 's/null/False/g'"
cmd = shlex.split(cmd)
status = subprocess.check_output(cmd)
print status["volume"]
If what I thought was true I would get "100". Instead, I get this error :
File "return.py", line 7, in <module>
print status["volume"]
TypeError: string indices must be integers, not str
this means "status" is stored as a string. Does anybody know how I can make it a dictionary?
dict() doesn't make it, i get :
ValueError: dictionary update sequence element #0 has length 1; 2 is required
Victory! I was able to make my code work with eval()
import subprocess
status = subprocess.check_output("volumio status | sed -e 's/true/True/g' -e 's/false/False/g' -e 's/null/False/g'", shell=True)
status = eval(status)
print status["volume"]
it returns 100

Use Sublime3 SFTP on EC2

I am trying to edit file in EC2 remotely, I spend a while to setup the config.json but I still got timeout error.
I am using mac and I already chmod 400 to .pem file
{
"type": "sftp",
"sync_down_on_open": true,
"host": "xxx.xx.xx.xxx",
"user": "ubuntu",
"remote_path": "/home/ubuntu/",
"connect_timeout": 30,
"sftp_flags": ["-o IdentityFile=/Users/kevinzhang/Desktop/zhang435_ec2.pem"],
}
I figure it out, Just in case anyone also have the same problem
I am use MAC OS
installed ubuntu
the config file is have is looks like
{
// The tab key will cycle through the settings when first created
// Visit http://wbond.net/sublime_packages/sftp/settings for help
// sftp, ftp or ftps
"type": "sftp",
// "save_before_upload": true,
"upload_on_save": true,
"sync_down_on_open": true,
"sync_skip_deletes": false,
"sync_same_age": true,
"confirm_downloads": false,
"confirm_sync": true,
"confirm_overwrite_newer": false,
"host": "xxxx.compute.amazonaws.com",
"user": "ubuntu",
//"password": "password",
"port": "22",
"remote_path": "/home/ubuntu/",
"ignore_regexes": [
"\\.sublime-(project|workspace)", "sftp-config(-alt\\d?)?\\.json",
"sftp-settings\\.json", "/venv/", "\\.svn/", "\\.hg/", "\\.git/",
"\\.bzr", "_darcs", "CVS", "\\.DS_Store", "Thumbs\\.db", "desktop\\.ini"
],
//"file_permissions": "664",
//"dir_permissions": "775",
//"extra_list_connections": 0,
"connect_timeout": 30,
//"keepalive": 120,
//"ftp_passive_mode": true,
//"ftp_obey_passive_host": false,
"ssh_key_file": "~/.ssh/id_rsa",
"sftp_flags": ["-o IdentityFile=<YOUR.PEM FILE path>"],
//"preserve_modification_times": false,
//"remote_time_offset_in_hours": 0,
//"remote_encoding": "utf-8",
//"remote_locale": "C",
//"allow_config_upload": false,
}
If you have permission problem :
chmod -R 0777 /home/ubuntu/YOURFILE/
this just enable read and write for all user
You may want to create a new user if above not working for you:
https://habd.as/sftp-to-ubuntu-server-sublime-text/
I do not know if this makes different , But looks like it start working for me for both user once Icreate a new user