Delphi - User specified string manipulation - regex

I have a problem in Delphi7. My application creates mpg video files according to a set naming convention i.e.
\000_A_Title_YYYY-MM-DD_HH-mm-ss_Index.mpg
In this filename the following rules are enforced:
The 000 is the video sequence. It is incremented whenever the user presses stop.
The A (or B,C,D) specifies the recording camera - so video files are linked with up to four video streams all played simultaneously.
Title is a variable length string. In my application it cannot contain a _.
The YYYY-MM-DD_HH-mm-ss is the starting time of the video sequence (not the single file)
The Index is the zero based ordering index and is incremented within 1 video sequence. That is, video files are a maximum of 15 minutes long, once this is reached a new video file is started with the same sequence number but next index. Using this, we can calculate the actual start time of the file (Filename decoded time + 15*Index)
Using this method my application can extract the starting time that the video file started recording.
Now we have a further requirement to handle arbitrarily named video files. The only thing i know for certain is there will be a YYYY-MM-DD HH-mm-ss somewhere in the filename.
How can i allow the user to specify the filename convention for the files he is importing? Something like Regular expressions? I understand there must be a pattern to the naming scheme.
So if the user inputs ?_(Camera)_*_YYYY-MM-DD_HH-mm-ss_(Index).mpg into a text box, how would i go about getting the start time? Is there a better solution? Or do i just have to handle every single possibility as we come accross them?
(I know this is probably not the best way to handle such a problem, but we cannot change the issue - the new video files are recorded by another company)

I'm not sure if your trying to parse the user input into components '?(Camera)*_YYYY-MM-DD_HH-mm-ss_(Index).mpg` but if your just trying to grab the date and time something like this, the date is in group 1, time in group 2
(\d{4}-\d{2}-\d{2})_(d{2}-\d{2}-\d{2})
Otherwise, not sure what your trying to do.

Possibly you can use the underscores "_" as your positional indicator since you smartly don't allow them in the title.
In your example of a filename convention:
?_(Camera)_*_YYYY-MM-DD_HH-mm-ss_(Index).mpg
you can parse this user-specified string to see that the date YYYY-MM-DD is always between the 3rd and 4th underscore and the time HH-mm-ss is between the 4th and 5th.
Then it becomes a simple matter when getting the actual filenames following this convention, to find the 3rd underscore and know the date and time follow it.

If you want phone-calls 24/7, then you should go for the RegEx-thing and let the user freely enter some cryptography in a TEdit.
If you want happy users and a good night sleep, then be creative and drop the boring RegEx-approach. Create your own filename-decoder by using an Angry bird approach.
Here's the idea:
Create some birds with different string manipulation personalities.
Let the user select and arrange these birds.
Execute the user generated string manipulation.
Sample code:
program AngryBirdFilenameDecoder;
{$APPTYPE CONSOLE}
uses
SysUtils;
procedure PerformEatUntilDash(var aStr: String);
begin
if Pos('-', aStr) > 0 then
Delete(aStr, 1, Pos('-', aStr));
WriteLn(':-{ > ' + aStr);
end;
procedure PerformEatUntilUnderscore(var aStr: String);
begin
if Pos('_', aStr) > 0 then
Delete(aStr, 1, Pos('_', aStr));
WriteLn(':-/ > ' + aStr);
end;
function FetchDate(var aStr: String): String;
begin
Result := Copy(aStr, 1, 10);
Delete(aStr, 1, 10);
WriteLn(':-) > ' + aStr);
end;
var
i: Integer;
FileName: String;
TempFileName: String;
SelectedBirds: String;
MyDate: String;
begin
Write('Enter a filename to decode (eg. ''01-ThisIsAText-Img_01-Date_2011-03-08.png''): ');
ReadLn(FileName);
if FileName = '' then
FileName := '01-ThisIsAText-Img_01-Date_2011-03-08.png';
repeat
TempFileName := FileName;
WriteLn('Now, select some birds:');
WriteLn('Bird No.1 :-{ ==> I''ll eat letters until I find a dash (-)');
WriteLn('Bird No.2 :-/ ==> I''ll eat letters until I find a underscore (_)');
WriteLn('Bird No.3 :-) ==> I''ll remember the date before I eat it');
WriteLn;
Write('Chose your birds: (eg. 112123):');
ReadLn(SelectedBirds);
if SelectedBirds = '' then
SelectedBirds := '112123';
for i := 1 to Length(SelectedBirds) do
case SelectedBirds[i] of
'1': PerformEatUntilDash(TempFileName);
'2': PerformEatUntilUnderscore(TempFileName);
'3': MyDate := FetchDate(TempFileName);
end;
WriteLn('Bird No.3 found this date: ' + MyDate);
WriteLn;
WriteLn;
Write('Check filename with some other birds? (Y/N): ');
ReadLn(SelectedBirds);
until (Length(SelectedBirds)=0) or (Uppercase(SelectedBirds[1])<>'Y');
end.
When you'll do this in Delphi with GUI, you'll add more birds and more checking of course. And find some nice bird glyphs.
Use two list boxes. One one the left with all possible birds, and one on the right with all the selected birds. Drag'n'drop birds from left to right. Rearrange (and remove) birds in the list on the right.
The user should be able to test the setup by entering a filename and see the result of the process. Internally you store the script by using enumerators etc.

Related

NLP with less than 20 Words on google cloud

According to this documentation: the classifyText method requires at least 20 words.
https://cloud.google.com/natural-language/docs/classifying-text#language-classify-content-nodejs
If I send in less than 20 words I get this no matter how clear the content is:
Invalid text content: too few tokens (words) to process.
Looking for a way to force this without disrupting the NLP too much. Are there neutral vector words that can be appended to short phrases that would allow the classifyText to process anyways?
ex.
async function quickstart() {
const language = require('#google-cloud/language');
const client = new language.LanguageServiceClient();
//less than 20 words. What if I append some other neutral words?
//.. a, of , it, to or would it be better to repeat the phrase?
const text = 'The Atlanta Braves is the best team.';
const document = {
content: text,
type: 'PLAIN_TEXT',
};
const [classification] = await client.classifyText({document});
console.log('Categories:');
classification.categories.forEach(category => {
console.log(`Name: ${category.name}, Confidence: ${category.confidence}`);
});
}
quickstart();
The problem with this is you're adding bias no matter what kind of text you send.
Your only chance is to fill up your string up to the minimum word limit with empty words that will be filtered out by the preprocessor and tokenizer before they go to the neural network.
I would try to add a string suffix at the end of the sentence with just stopwords from NLTK like this:
document.content += ". and ourselves as herserf for each all above into through nor me and then by doing"
Why the end? Because usually text has more information at the beginning.
In case Google does not filter stopwords behind the scenes (which I doubt), this would add just white noise where the network has no focus or attention.
Remember: DO NOT add this string when you have enough words because you are billed for 1K character blocks before they are filtered.
I would also add that string suffix to sencences in your train/test/validation set that have less than 20 words and see how it works. The network should learn to ignore the whole sentence.

How to manage looping on this list on Applscript?

The list is in the form of:-
0:
https://url
1:
https://url
..... And so on.
How could I loop on this list. So I could fetch the number first without ":" and type it somewhere then fetch the url that comes after that number and type it elsewhere. Then end repeat if the list is over.
Or should I use records instead?
I am still a beginner using AppleScript. I tried many commands I mixed up but the computer keeps running the script nonestop and the activity monitor shows the applescript using 100% of the processor and huge amount of ram.
Appreciate any help.
Thank you
You didn't define what your list really looks like very well so I made an assumption on my answer below. If I was wrong, hopefully my answer will at least point you in the right direction. (or if I've gotten it wrong, but you can choose to reformat it to the way I suggested, that could still help)
on run
set theList to {"0:http://apple.com", "1:http://google.com"} -- my guess at what your list looks like.
repeat with anItem in theList
set anItem to anItem as string
set itemParts to myParseItem(anItem)
set tID to the_integer of itemParts as integer
set tURL to the_url of itemParts as string
end repeat
end run
on myParseItem(theItem)
set AppleScript's text item delimiters to ":"
set delimitedList to every text item of theItem
set newString to (items 2 thru -1 of delimitedList as string)
set AppleScript's text item delimiters to {""}
set theInt to item 1 of delimitedList
set theURL to newString as string
return {the_integer:theInt, the_url:theURL}
end myParseItem

Renaming files with no fixed char length in Python

I am currently learning Python 2.7 and am really impressed by how much it can do.
Right now, I'm working my way through basics such as functions and loops. I'd reckon a more 'real-world' problem would spur me on even further.
I use a satellite recording device to capture TV shows etc to hard drive.
The naming convention is set by the device itself. It makes finding the shows you want to watch after the recording more difficult to find as the show name is preceded with lots of redundant info...
The recordings (in .mts format) are dumped into a folder called "HBPVR" at the root of the drive. I'd be running the script on my Mac when the drive is connected to it.
Example.
"Channel_4_+1-15062015-2100-Exams__Cheating_the_....mts"
or
"BBC_Two_HD-19052015-2320-Newsnight.mts"
I included the double-quotes.
I'd like a Python script that (ideally) would remove the broadcaster name, reformat the date info, strip the time info and then put the show's name to the front of the file name.
E.g "BBC_Two_HD-19052015-2320-Newsnight.mts" ->> "Newsnight 19 May 2015.mts"
What may complicate matters is that the broadcaster names are not all of equal length.
The main pattern is that broadcaster name runs up until the first hyphen.
I'd like to be able to re-run this script at later points for newer recordings and not have already renamed recordings renamed further.
Thanks.
Try this:
import calendar
input = "BBC_Two_HD-19052015-2320-Newsnight.mts"
# Remove broadcaster name
input = '-'.join(input.split("-")[1:])
# Get show name
show = ''.join(' '.join(input.split("-")[2:]).split(".mts")[:-1])
# Get time string
timestr = ''.join(input.split("-")[0])
day = int(''.join(timestr[0:2])) # The day is the first two digits
month = calendar.month_name[int(timestr[2:4])] # The month is the second two digits
year = timestr[4:8] # The year is the third through sixth digits
# And the new string:
new = show + " " + str(day) + " " + month + " " + year + ".mts"
print(new) # "Newsnight 19 May 2015.mts"
I wasn't quite sure what the '2320' was, so I chose to ignore it.
Thanks Coder256.
That has given me a bit more insight into how Python can actually help solve real world (first world!) problems like mine.
It tried it out with some different combos of broadcaster and show names and it worked.
I would like though to use the script to rename a batch of recordings/files inside the folder from time to time.
The script did throw and error when processing an already re-named recording, which is to be expected I guess. Should the renamed file have a special character at the start of its name to help avoid this happening?
e.g "_Newsnight 19 May 2015.mts"
Or is there a more aesthetically pleasing way of doing this, with special chars being added on etc.
Thanks.
One way to approach this, since you have a defined pattern is to use regular expressions:
>>> import datetime
>>> import re
>>> s = "BBC_Two_HD-19052015-2320-Newsnight.mts"
>>> ts, name = re.findall(r'.*?-(\d{8}-\d{4})-(.*?)\.mts', s)[0]
>>> '{} {}.mts'.format(name, datetime.datetime.strptime(ts, '%d%m%Y-%H%M').strftime('%d %b %Y'))
'Newsnight 19 May 2015.mts'

How to import files from name automatically in python

Let's assume to make it easier that I have 10 files named "1","2",...,"10".
Today I am in a situation where i want to load in a script those 10 files, one at a time.
I am using that code, which is written ten times in a row with in between the mathemical operations I want to use on the Data contained in those files :
Tk().withdraw()
filename2 = askopenfilename()
with load(filename2) as data:
..."mathematical operations"...
Tk().withdraw()
filename3 = askopenfilename()
with load(filename3) as data:
etc,etc ...
This way opens 10 dialog boxes,one after one, where I need to type the name of the file to load it ( so I type "1", hit enter, then type "2" in the next box, hit enter, blablabla ).
I am looking for a way to have only one box of dialog to open (or maybe you know something even smarter to do), and type one time in a row the right order of numbers so the script load them one at a time on by himself.
In other words, in a short amount of time I will have 300 files, I just want to type once :1,2,3,4,5,...,300 and hit enter, rather than doing what I described earlier.
Or maybe a way to just type "300" and the script knows he has to look for files starting at "1" et incrementing one by one.
The open function just takes a string, and you can create that string any way you want. You can concatenate the static parts of your filename with a changing number in a for loop:
s_pre = 'file'
s_ext = '.txt'
numFiles = int(raw_input("number of files: "))
for i in range(1, numFiles + 1):
filename = s_pre + str(i) + s_ext
with open(filename) as data:
## input stuff
## math stuff
I assume load is your function, and you can just pass this filename in the loop to load as well.

Elegant way to distinct Path or Entry key

I have an application loading CAD data (Custom format), either from the local filesystem specifing an absolute path to a drawing or from a database.
Database access is realized through a library function taking the drawings identifier as a parameter.
the identifiers have a format like ABC 01234T56-T, while my paths a typical windows Paths (eg x:\Data\cadfiles\cadfile001.bin).
I would like to write a wrapper function Taking a String as an argument which can be either a path or an identifier which calls the appropriate functions to load my data.
Like this:
Function CadLoader(nameOrPath : String):TCadData;
My Question: How can I elegantly decide wether my string is an idnetifier or a Path to a file?
Use A regexp? Or just search for '\' and ':', which are not appearing in the Identifiers?
Try this one
Function CadLoader(nameOrPath : String):TCadData;
begin
if FileExists(nameOrPath) then
<Load from file>
else
<Load from database>
end;
I would do something like this:
function CadLoader(nameOrPath : String) : TCadData;
begin
if ((Pos('\\',NameOrPath) = 1) {UNC} or (Pos(':\',NameOrPath) = 2) { Path })
and FileExists(NameOrPath) then
begin
// Load from File
end
else
begin
// Load From Name
end;
end;
The RegEx To do the same thing would be: \\\\|.:\\ I think the first one is more readable.
In my opinion, the K.I.S.S. principle applies (or Keep It Simple Stupid!). Sounds harsh, but if you're absolutely certain that the combination :\ will never be in your identifiers, I'd just look for it on the 2nd position of the string. Keeps things understandable and readable. Also, one more quote:
Some people, when confronted with a
problem, think "I know, I'll use
regular expressions." Now they have
two problems. - Jamie Zawinski
You should pass in an additional parameter that says exactly what the identifier actually represents, ie:
type
CadLoadType = (CadFromPath, CadFromDatabase);
Function CadLoader(aType: CadLoadType; const aIdentifier: String): TCadData;
begin
case aType of
CadFromPath: begin
// aIdentifier is a file path...
end;
CadFromDatabase: begin
// aIdentifier is a database ID ...
end;
end;
end;
Then you can do this:
Cad := CadLoader(CadFromFile, 'x:\Data\cadfiles\cadfile001.bin');
Cad := CadLoader(CadFromDatabase, 'ABC 01234T56-T');