What tool to use for this regex - regex

I am using a combination of Automator, Bash, and Exiftool to take filenames like this: 0615090217.jpg and change the date created to 2009:06:15 02:17:00.
Most of the pieces of the puzzle are working, I even have working regex, I just don't know how to apply it using bash or some combination of other tools. I've been seeing sed suggested, but I don't know how to apply it.
The following regex works here, but I don't know how to apply it in my setup:
Expression: /(\d{2})(\d{2})(\d{2})(\d{2})(\d{2})(.*)\.[^.]+$/g
Substitution: \n20$3:$1:$2 $4:$5:00\n\t
Text: 0615090217.jpg
The shell script in my Automator workflow looks like this:
for f in "$#"
do
FILENAME=$(basename "$f")
MYDATE='2010:07:09 12:22:00'
/usr/local/bin/exiftool -overwrite_original_in_place -preserve "-AllDates=${MYDATE}" "$f"
done
I want to replace MYDATE with a date extracted from the filename, using my regex or some other method. I feel like I'm close, it's just connecting the final dots.

If you're using OSX, FreeBSD, NetBSD, etc, then the date command lets you easily convert from one format to another:
#!/usr/bin/env bash
for f in "$#"
do
FILENAME=$(basename "$f")
MYDATE=$(date -j -f '%m%d%y%H%M.jpg' "$FILENAME" '+%Y-%m-%d %H:%M:00')
/usr/local/bin/exiftool \
-overwrite_original_in_place \
-preserve "-AllDates=${MYDATE}" \
"$f"
done
You can also achieve this by ripping apart the filename using bash's "Parameter Expansion", but that takes more typing.

Exiftool can do this by itself. There's no need for scripts, since that would just slow the whole process down because it would call exiftool for each file.
Try something like:
/usr/local/bin/exiftool -overwrite_original_in_place -preserve '-AllDates<${Filename;s/(\d{2})(\d{2})(\d{2})(\d{2})(\d{2})(.*)\.[^.]+$/20$3:$1:$2 $4:$5:00/}' DIR
I just lifted your regex and stuck it in, so test it out first, of course. My quick test here worked correctly, output below.
c:\>exiftool -g1 -alldates X:\!temp\0615090217.jpg
---- IFD0 ----
Modify Date : 2012:08:30 22:25:33
---- ExifIFD ----
Date/Time Original : 2013:18:08 19:04:15
Create Date : 2012:08:30 22:25:33
c:\>exiftool "-AllDates<${Filename;s/(\d{2})(\d{2})(\d{2})(\d{2})(\d{2})(.*)\.[^.]+$/20$3:$1:$2 $4:$5:00/}" X:\!temp\0615090217.jpg
1 image files updated
c:\>exiftool -g1 -alldates X:\!temp\0615090217.jpg
---- IFD0 ----
Modify Date : 2009:06:15 02:17:00
---- ExifIFD ----
Date/Time Original : 2009:06:15 02:17:00
Create Date : 2009:06:15 02:17:00

Related

Is it possible to let DCMTK's writeJson() write tag names?

I am using the DCMTK library in my program, which among others writes a JSON. With the DcmDataset::writeJson() function I can put the whole header in the JSON in one call, which is very handy, but the tags are listed by offset not name.
This is the same as with the command-line program dcm2json, which writes a JSON file where each tag is represented by an 8-digit string of the offset.
The other command-line tool for getting this information, dcmdump gives this for the slice location:
$ dcmdump $dcmfile | grep SliceLocation
(0020,1041) DS [-67.181462883113] # 16, 1 SliceLocation
and I can do
$ dcm2json $dcmfile | grep -n3 67.181462883113
1552- "00201041": {
1553- "vr": "DS",
1554- "Value": [
1555: -67.181462883113
1556- ]
1557- },
1558- "00280002": {
to find it in the JSON stream, or even (the C++ equivalent of)
$ dcm2json $dcmfile | grep -n3 $(dcmdump $dcmfile | grep SliceLocation | awk '{print $1}' | tr "()," " " | awk '{print $1$2}')
but that feels like a very roundabout way to do things.
Is there a way to write out a JSON directly with the name of the DICOM tags, or another way to combine the DcmDataset::writeJson() and dcmdump functionality?
The output format of dcm2json is defined by the DICOM standard (see PS3.18 Chapter F), so there is no way to add the Attribute Names/Keywords. However, you might want to try dcm2xml, which supports both a DCMTK-specific output format and the Native DICOM Model (see PS3.19 Chapter A.1). Both formats make use of the official Keywords that are associated with each DICOM Attribute (see PS3.6 Section 6).

Extracting part of lines with specific pattern and sum the digits using bash

I am just learning bash scripting and commands and i need some help with this assignment.
I have txt file that contains the following text and i need to:
Extract guest name ( 1.1.1 ..)
Sum guest result and output the guest name with result.
I used sed with simple regex to extract out the name and the digits but i have no idea about how to summarize the numbers becuase the guest have multiple lines record as you can see in the txt file. Note: i can't use awk for processing
Here is my code:
cat file.txt | sed -E 's/.*([0-9]{1}.[0-9]{1}.[0-9]{1}).*([0-9]{1})/\1 \2/'
And result is:
1.1.1 4
2.2.2 2
1.1.1 1
3.3.3 1
2.2.2 1
Here is the .txt file:
Guest 1.1.1 have "4
Guest 2.2.2 have "2
Guest 1.1.1 have "1
Guest 3.3.3 have "1
Guest 2.2.2 have "1
and the output should be:
1.1.1 = 5
2.2.2 = 3
3.3.3 = 1
Thank you in advance
I know your teacher wont let you use awk but, since beyond this one exercise you're trying to learn how to write shell scripts, FYI here's how you'd really do this job in a shell script:
$ awk -F'[ "]' -v OFS=' = ' '{sum[$2]+=$NF} END{for (id in sum) print id, sum[id]}' file
3.3.3 = 1
2.2.2 = 3
1.1.1 = 5
and here's a bash builtins equivalent which may or may not be what you've covered in class and so may or may not be what your teacher is expecting:
$ cat tst.sh
#!/bin/env bash
declare -A sum
while read -r _ id _ cnt; do
(( sum[$id] += "${cnt#\"}" ))
done < "$1"
for id in "${!sum[#]}"; do
printf '%s = %d\n' "$id" "${sum[$id]}"
done
$ ./tst.sh file
1.1.1 = 5
2.2.2 = 3
3.3.3 = 1
See https://www.artificialworlds.net/blog/2012/10/17/bash-associative-array-examples/ for how I'm using the associative array. It'll be orders of magnitude slower than the awk script and I'm not 100% sure it's bullet-proof (since shell isn't designed to process text there are a LOT of caveats and pitfalls) but it'll work for the input you provided.
OK -- since this is a class assignment, I will tell you how I did it, and let you write the code.
First, I sorted the file. Then, I read the file one line at a time. If the name changed, I printed out the previous name and count, and set the count to be the value on that line. If the name did not change, I added the value to the count.
Second solution used an associative array to hold the counts, using the guest name as the index. Then you just add the new value to the count in the array element indexed on the guest name.
At the end, loop through the array, print out the indexes and values.
It's a lot shorter.

How do I get a list of all of the XML tags in a file using PowerShell and Regular Expressions?

This question is related to RegEx find all XML tags but I'm trying to do it in Windows PowerShell.
I have an XML file that contains many different XML tags, and the file is Huge, so basically I want to use RegEx to parse the file and spit out the name of all the tags as a list. The XML document is not a valid XML document even though it contains XML tags and elements. So using the XML functions of PowerShell won't work. I get many errors when trying to view it as an XML document, thus the need to use RegEx.
I've determined that the following RegEx identifies the tags (thanks to the related question mentioned above): (?<=<)([^\/]*?)((?= \/>)|(?=>))
Here's a very small sniplet of the file I'm parsing:
<data><bp_year /><bp_make>John Deere</bp_make><bp_model>650</bp_model><bp_price>3000.00</bp_price><bp_txtDayPhone>555-555-5555</bp_txtDayPhone><bp_bestPrice>3000.0000</bp_bestPrice><bp_txtComments>Best price available?</bp_txtComments><bp_url>https://www.example.com</bp_url></data>
<data><receiveOffers /><link>http://example.com/inventory.htm?id=2217405&used=1</link><itemName>2007 Yamaha RHINO 660</itemName></data>
<data><vehicleYear>2008</vehicleYear><vehicleMake>Buick</vehicleMake><vehicleModel>Enclave</vehicleModel><vehicleStyle>CX</vehicleStyle><vehicleInformation /><vehicleMileage /><phone>555-555-5555</phone><timeOfDay>Morning</timeOfDay><message /></data>
<data><mo_year>2009</mo_year><mo_make>Webasto</mo_make><mo_model>Air Top 2000</mo_model><mo_price /><mo_txtDayPhone>555-555-5555</mo_txtDayPhone><mo_txtOffer>700</mo_txtOffer><mo_txtTrade /><mo_txtComments /></data>
I really don't have much experience with Powershell, but from my understanding, you can do Grep stuff with it. After searching around on the internet, I found some resources that helped point me towards my solution, via using the powershell Select-String command.
I've attempted the following powershell command, but it gives me way too much feedback. I just want a master "Matches" list.
Select-String -Path '.\dataXML stuff - Copy.xml'-Pattern "(?<=<)([^\/]*?)((?= \/>)|(?=>))" -AllMatches | Format-List -Property Matches
Sample of Output generated:
Matches : {data, vehicleYear, vehicleMake, vehicleModel...}
Matches : {data, address, city, region...}
Matches : {data, vehicleYear, vehicleMake, vehicleModel...}
Matches : {data, vehicleYear, vehicleMake, vehicleModel...}
Matches : {data, address, city, region...}
Matches : {data, vehicleYear, vehicleMake, vehicleModel...}
Matches : {data, vehicleYear, vehicleMake, vehicleModel...}
Matches : {data, mo_year, mo_make, mo_model...}
Basically, I want something like:
data
vehicleYear
vehicleMake
vehicleModel
address
city
region
mo_year
mo_make
mo_model
and so on and on....
Where only the matched strings are returned and listed, rather than telling me what matched on each line of the XML file. I prefer the list format because then I can pump this into Excel and get a distinct list of tag names, and then start actually doing what I need to accomplish, but the overwhelming number of different XML tags and not knowing what they are is holding me up.
Maybe Select-String isn't the best method to use, but I feel like I'm close to my solution after finding this Microsoft post:
https://social.technet.microsoft.com/Forums/windowsserver/en-US/d5bbd2fb-c8fa-43ed-b432-79ebfeee82ea/return-only-matches-from-selectstring?forum=winserverpowershell
Basically, here's the solution modified to fit my needs:
Gc 'C:\Documents\dataXML stuff - Copy.xml'|Select-String -Pattern "(?<=<)([^\/]*?)((?= \/>)|(?=>))"|foreach {$_.matches}|select value
It provides a list of all the xml tags, just like I wanted, except it only returns the first XML tag of that line, so I get a lot of:
data
data
data
but no vehicleYear, vehicleMake, vehicleModel, etc., which would have been the 2nd or 3rd or 11th xml tag of that line.
As for ...
Like I mentioned earlier in the post, I do not use PowerShell at all
Reading is a good thing, but see it in action is better. There are many free video resources to view PowerShell from the beginning, and tons of references. Then the are the MS TechNet virtual labs to leverage.
See this post for folks providing some paths for learning PowerShell.
Does anyone have any experience teaching others powershell?
https://www.reddit.com/r/PowerShell/comments/7oir35/help_with_teaching_others_powershell
Sure you could do it with RegEx, but it is best to handle it natively.
In PowerShell, XML is a big deal; as is JSON. All the help files a just XML files. There are bulit-in cmdlets to deal with it.
# Get parameters, examples, full and Online help for a cmdlet or function
Get-Command -Name '*xml*' | Format-Table -AutoSize
(Get-Command -Name Select-Xml).Parameters
Get-help -Name Select-Xml -Examples
Get-help -Name Select-Xml -Full
Get-help -Name Select-Xml -Online
Get-Help about_*
# Find all cmdlets / functions with a target parameter
Get-Help * -Parameter xml
# All Help topics locations
explorer "$pshome\$($Host.CurrentCulture.Name)"
And many sites that present articles on dealing with it.
PowerShell Data Basics: XML
To master PowerShell, you must know how to use XML. XML is an essential data interchange format because it remains the most reliable way of ensuring that an object's data is preserved. Fortunately, PowerShell makes it all easy, as Michael Sorens demonstrates.
https://www.red-gate.com/simple-talk/sysadmin/powershell/powershell-data-basics-xml
Converting XML to PowerShell PSObject
Recently, I was working on some code (of course) and had a need to convert some XML to PowerShell PSObjects. I found some snippets out there that sort of did this, but not the way that I needed for this exercise. In this case I’m converting XML meta data from Plex.
https://consciouscipher.wordpress.com/2015/06/05/converting-xml-to-powershell-psobject
Mastering everyday XML tasks in PowerShell
PowerShell has awesome XML support. It is not obvious at first, but with a little help from your friends here at PowerShellMagazine.com, you’ll soon solve every-day XML tasks – even pretty complex ones – in no time.
So let’s check out how you put very simple PowerShell code to work to get the things done that used to be so mind-blowingly complex in the pre-PowerShell era.
http://www.powershellmagazine.com/2013/08/19/mastering-everyday-xml-tasks-in-powershell
For all intents and purposes, if I just take one row for your sample, and do this using the .Net xml namespace...
($MyXmlData = [xml]'<data><bp_year /><bp_make>John Deere</bp_make><bp_model>650</bp_model><bp_price>3000.00</bp_price><bp_txtDayPhone>555-555-5555</bp_txtDayPhone><bp_bestPrice>3000.0000</bp_bestPrice><bp_txtComments>Best price available?</bp_txtComments><bp_url>https://www.example.com</bp_url></data>')
data
----
data
You get resutls like this...
$MyXmlData.data
bp_year :
bp_make : John Deere
bp_model : 650
bp_price : 3000.00
bp_txtDayPhone : 555-555-5555
bp_bestPrice : 3000.0000
bp_txtComments : Best price available?
bp_url : https://www.example.com
with intellisene / autocomplete of the nodes / elements...
$MyXmlData.data.bp_year
Another view...
$MyXmlData.data | Format-Table -AutoSize
bp_year bp_make bp_model bp_price bp_txtDayPhone bp_bestPrice bp_txtComments bp_url
------- ------- -------- -------- -------------- ------------ -------------- ------
John Deere 650 3000.00 555-555-5555 3000.0000 Best price available? https://www.example.com
And from that, just geting the tags / names
$MyXmlData.data.ChildNodes.Name
bp_year
bp_make
bp_model
bp_price
bp_txtDayPhone
bp_bestPrice
bp_txtComments
bp_url
So, armed with the above approaches / notes. It just becomes a matter of looping through your file to get all you are after.
So, just taking your sample and dumping it into a file with no changes, one can do this.
$MyXmlData = (Get-Content -Path 'D:\Scripts\MyXmlData.xml')
$MyXmlData | Format-List -Force
ForEach($DataRow in $MyXmlData)
{
($DataObject = [xml]$DataRow).Data | Format-Table -AutoSize
}
bp_year bp_make bp_model bp_price bp_txtDayPhone bp_bestPrice bp_txtComments bp_url
------- ------- -------- -------- -------------- ------------ -------------- ------
John Deere 650 3000.00 555-555-5555 3000.0000 Best price available? https://www.example.com
receiveOffers link itemName
------------- ---- --------
http://example.com/inventory.htm?id=2217405&used=1 2007 Yamaha RHINO 660
vehicleYear vehicleMake vehicleModel vehicleStyle vehicleInformation vehicleMileage phone timeOfDay message
----------- ----------- ------------ ------------ ------------------ -------------- ----- --------- -------
2008 Buick Enclave CX 555-555-5555 Morning
mo_year mo_make mo_model mo_price mo_txtDayPhone mo_txtOffer mo_txtTrade mo_txtComments
------- ------- -------- -------- -------------- ----------- ----------- --------------
2009 Webasto Air Top 2000 555-555-5555 700
ForEach($DataRow in $MyXmlData)
{
($DataObject = [xml]$DataRow).Data.ChildNodes.Name
}
bp_year
bp_make
bp_model
bp_price
bp_txtDayPhone
bp_bestPrice
bp_txtComments
bp_url
receiveOffers
link
itemName
vehicleYear
vehicleMake
vehicleModel
vehicleStyle
vehicleInformation
vehicleMileage
phone
timeOfDay
message
mo_year
mo_make
mo_model
mo_price
mo_txtDayPhone
mo_txtOffer
mo_txtTrade
mo_txtComments
Yet, note, this is not the only way to do this.

how to use grep to parse out columns in csv

I have a log with millions of lines that like this
1482364800 bunch of stuff 172.169.49.138 252377 + many other things
1482364808 bunch of stuff 128.169.49.111 131177 + many other things
1482364810 bunch of stuff 2001:db8:0:0:0:0:2:1 124322 + many other things
1482364900 bunch of stuff 128.169.49.112 849231 + many other things
1482364940 bunch of stuff 128.169.49.218 623423 + many other things
Its so big that I can't really read it into memory for python to parse so i want to zgrep out only the items I need into another smaller file but Im not very good with grep. In python I would normally open.gzip(log.gz) then pull out data[0],data[4],data[5]to a new file so my new file only has the epoc and ip and date(the ip can be ipv6 or 4)
expected result of the new file:
1482364800 172.169.49.138 252377
1482364808 128.169.49.111 131177
1482364810 2001:db8:0:0:0:0:2:1 124322
1482364900 128.169.49.112 849231
1482364940 128.169.49.218 623423
How do I do this zgrep?
Thanks
To select columns you have to use cut command zgrep/grep select lines
so you can use cut command like this
cut -d' ' -f1,2,4
in this exemple I get the columns 1 2 and 4 with space ' ' as a delimiter of the columns
yous should know that -f option is used to specify numbers of columns and -d for the delimiter.
I hope that I have answered your question
I'm on OSX and maybe that is the issue but I couldnt get zgrep to work in filtering out columns. and zcat kept added a .Z at the end of the .gz. Here's what I ended up doing:
awk '{print $1,$3,$4}' <(gzip -dc /path/to/source/Largefile.log.gz) | gzip > /path/to/output/Smallfile.log.gz
This let me filter out the 3 columns I needed from the Largefile to a Smallfile while keeping both the source and destination in compressed format.

How to get only the git date format as YYYY/MM/DD?

Currently I am running: git log -1 --date=format:"%Y/%m/%d" -- /path/to/file
It outputs something like:
commit 7d1c2bcf16f7007ca900682b025ddf961fd36631
Author: John Smith
Date: 2016/06/16
[maven-release-plugin] some text
I only need the date. So far the only way I can extract just the date is by processing the output more with node.js.
var date = require('child_process')
.execSync('git log -1 --date=format:"%Y/%m/%d" -- ./pom.xml')
.toString()
.match(/\d{4}\/\d{2}\/\d{2}/)[0];
Is it possible to only receive 2016/06/16 via the git command?
git log -1 --pretty='%ad' --date=format:'%Y/%m/%d'
%ad is author date. If you need committer date, use %cd instead.
The simple solution is to use this:
date -d #$(git log -n1 --format="%at") +%Y/%m/%d
You can get a close results with the --date=iso format.
for example:
git log --date=iso --pretty=format:
'%ad%x08%x08%x08%x08%x08%x08%x08%x08%x08%x08%x08%x08%x08%x08%aN %s'
Or this one:
git log --date=iso-local --pretty=format:'%ad'