Extract a filename from json file path - regex

1. env\nonprod\alert\prj-data-02\alt\airflow-alert.json
2. env\nonprod\alert\prj-data-02\alt\biquery-alert.json
I would like to use regex to get the word before "*-alert.json" in the line. For example: from line 1 I would like to get the name 'airflow' and from line 2 I would like to get the value 'bigquery'.

.*\\(.*)-alert\.json$ should work. The parentheses mean it captures the words in them. There are ways of accessing whats contained in them but it depends on the implementation of regex that you are using. I think bash uses $<index> syntax for capture groups in pipelines and such. Here’s a website that describes this.

Related

NotePad++ regex match and replace and also keep match to convert to different markdown image reference link

I have the following link syntax that needs to be changed:
![[afoldernamenolongerneededandwillbedeprecated/somemarkdownfilename_image1.png]]
I tried (successfully) with this regex to match:
![[].*[\/].*_image[0-9].png[]]]
Although I have a hunch it may not be what I should use. I the novice think it may be only good for matching and not replacing. All images are png's, by the way. All filenames have _image in them, prefixed by the markdown file-name.
Desired end format:
![image](imagenamefromabovestring1,2,orhowevermanythereare.png)
The
![]()
is a known syntax in markdown to reference images. Images will be populated in subdirectories the program/app will find.
It goes without saying I want to run find and replace recursively on some 4000 files containing image references.
I put up the unfinished substitution example here:
https://regex101.com/r/Bl8HJC/1
So to clarify more on what I need. I need the formerly present folder name gone. I don't need it anymore. Then after the slash comes the name of the image, the syntax of which is always: current filename to be proccessed by NotePad++ recursively (it can be a markdown file named Ab, Aba, Abracadabra, etc.) and this filename always serves as prefix, then comes an underscore and 'image' + a number depending on how many images are linked to the markdown file as attachments. The names of the files to go in an attachment folder will look like this:
AB_image3.png
Abracadabra_image2.png
.
.
.
Zodiac_image45.png
I am looking for the right syntax as I couldn't figure it out with the dollar sign.
Cheers,
Otto
I have modified your example to get it working here. What you needed to do is escape the square brackets so they would be interpreted literally, since they have special meaning in regex, and you needed to use a capture group to store the matching value in $1 so you could use it in the replacement.
Regular expression:
!\[\[.*\/(.*_image[0-9]{1,2}\.png)\]\]
Substitution format:
![image]\($1\)
Edit: Question was revised to state that the folder name was unwanted in the final output, so matches are delimited after the final / character in the file path.
Edit 2: Support for file numbers 1 through 99.

Regex: Identify file name with "string" but exclude if has .filepart extension

I have a requirement to search through a directory to identify specific files with a string contained in the file name. But I want to exclude part loaded files with a ".filepart" extension.
This must be done through Regex due to tool limitations.
The file names can be in multiple formats, and we must identify them from the "file identifier" string that we pass into the Regex.
I have read some very good articles within SO and other websites but I am struggling to nail down the correct syntax.
I have saved a page on regex101.com to provide a more detailed explanation of what I am trying to achieve. The "FILETYPE" can be considered the string we pass into the Regex.
https://regex101.com/r/zTrbyX/4
Thanks,
K
Your original regex is:
.*FILETYPE.*\.[[:alnum:]]*(?!filepart)
will give the same result as:
.*FILETYPE.*
Instead you could use the following regex (similar to CAustin solution in comments):
.*FILETYPE.*(?<!filepart)$
This will match every line starting with .*FILETYPE.* and not ending with filetpart. Here $ denotes the end of the line. In regex101.com you need to activate flag m for $ to be recognized as EOL.

RegEx SQL, issue escaping quotes

I am trying to use PSQL, specifically AWS Redshift to parse a line. Sample data follows
{"c.1.mcc":"250","appId":"sx-calllog","b.level":59,"c.1.mnc":"01"}
{"appId":"sx-voice-call","b.level":76,"foreground":9}
I am trying the following regex in order to to extract the appId field, but my query is returning empty fields.
'appId\":\"[\w*]\",'
Query
SELECT app_params,
regexp_substr(app_params, 'appId\":\"[\w*]\",')
FROM sample;
You can do that as follows:
(\"appId\":\"[^"]*\")(?:,)
Demo: http://regex101.com/r/xP0hW3
The first extracted group is what you want.
Your regex was not matching because \w does not match -
Adding this here despite this being an old question since it may help someone viewing this down the road...
If your lines of data are valid json, you can use Redshift's JSON_EXTRACT_PATH_TEXT function to extract the value a given key. Emphasis on the json being valid, as it will fail if even one line cannot be parsed and Redshift will throw a JSON parsing error.
Example using given data:
select json_extract_path_text('{"c.1.mcc":"250","appId":"sx-calllog","b.level":59,"c.1.mnc":"01"}','appId');
returns sx-calllog
This is especially useful since Redshift does not support lookahead/lookbehind (it is POSIX regex) & extract groups.
You can try using some lookahead and look behinds to isolate just the text inside the quotes for the appid. (?<=appId\":\")(?=.*\",)[^\"]*. I tested this out a bit using your examples you provided here.
To explain the regex a bit more: (?<=appId\":\")(?=.*\",)[^\"]*
(?<=appId\":\"): positive look behind for appid":". Since you don't want the appid text itself being returned (just the value), you can preface the regex with a look behind to say "find me the following regex, but only when it is following the look behind text.
(?=.*\",): positive look ahead for the ending ",. You don't want quotes to be returned in your match, but as with number 1 you want your regex to be bounded a bit and a look ahead does that.
[^\"]*: The actual matching portion. You want to find the string of chars that are NOT ". This will match the entire value and stop matching right before the closing ".
EDIT: Changed the 3rd step a little bit, removed the , from that last piece, it is not needed and would break the match if the value were to actually contain a ,.

regex to get portion of file name after last dot without file extension

I have a bunch of files, some examples are as follows:
/foo1/foo2/bar1.bar2.bar3.answer.jar
/foo1/bar1.bar2.answer.jar
/foo1/foo2/answer.jar
and for all of the above I would like a regex that matches 'answer'. In other words, I'm looking to get an alias for the file that is the portion of the file name after the last dot (or the file name itself if there are no dots) with the file extension (.jar can be guaranteed here to make it simpler) stripped off.
I know I can do this with a more simple regex to split the value up by dots and then get the second last one, but in this case I'm building a back-end thing that will ideally take a regex that is defined in a configuration definition for the given file type, and spit out the alias, which might be different for other file types.
Yep, I'm over-engineering. :)
Any ideas?
Following regex should work for you:
[^/.]+(?=\.jar$)
If using Javascript or a similar flavor where / is regex delimiter then you need to escape / like this:
[^\/.]+(?=\.jar$)
You can use the following regexp: (assuming that the answer part doesn't contain . or /)
[/\.]([^/\.]+)\.jar
The first capturing group is the part what you want to.

Regex: Get Filename Without Extension in One Shot?

I want to get just the filename using regex, so I've been trying simple things like
([^\.]*)
which of course work only if the filename has one extension. But if it is adfadsfads.blah.txt I just want adfadsfads.blah. How can I do this with regex?
In regards to David's question, 'why would you use regex' for this, the answer is, 'for fun.' In fact, the code I'm using is simple
length_of_ext = File.extname(filename).length
filename = filename[0,(filename.length-length_of_ext)]
but I like to learn regex whenever possible because it always comes up at Geek cocktail parties.
Try this:
(.+?)(\.[^.]*$|$)
This will:
Capture filenames that start with a dot (e.g. .logs is a file named .logs, not a file extension), which is common in Unix.
Gets everything but the last dot: foo.bar.jpeg gets you foo.bar.
Handles files with no dot: secret-letter gets you secret-letter.
Note: as commenter j_random_hacker suggested, this performs as advertised, but you might want to precede things with an anchor for readability purposes.
Everything followed by a dot followed by one or more characters that's not a dot, followed by the end-of-string:
(.+?)\.[^\.]+$
The everything-before-the-last-dot is grouped for easy retrieval.
If you aren't 100% sure every file will have an extension, try:
(.+?)(\.[^\.]+$|$)
how about 2 captures one for the end and one for the filename.
eg.
(.+?)(?:\.[^\.]*$|$)
^(.*)\\(.*)(\..*)$
Gets the Path without the last \
The file without extension
The the extension with a .
Examples:
c:\1\2\3\Books.accdb
(c:\1\2\3)(Books)(.accdb)
Does not support multiple . in file name
Does support . in file path
I realize this question is a bit outdated, however, I had some trouble finding a good source and wound up making the regex myself. To save whoever may find this time,
If you're looking for a ~standalone~ regex
This will match the extension without the dot
\w+(?![\.\w])
This will always match the file name if it has an extention
[\w\. ]+(?=[\.])
Ok, I am not sure why I would use regular expression for this. If I know for example that the string is a full filepath, then I would use another API to get the file name. Regular expressions are very powerfull but at the same time quite complex (you have just proved that by asking how to create such a simple regex). Somebody said: you had a problem that you decided to solve it using regular expressions. Now you have two problems.
Think again. If you are on .NET platform for example, then take a look at System.IO.Path class.
I used this pattern for simple search:
^\s*[^\.\W]+$
for this text:
file.ext
fileext
file.ext.ext
file.ext
fileext
It finds fileext in the second and last lines.
I applied it in a text tree view of a folder (with spaces as indents).
Just the name of the file, without path and suffix.
^.*[\\|\/](.+?)\.[^\.]+$
Try
(?<=[\\\w\d-:]*\\)([\w\d-:]*)(?=\.[\.\w\d-:]*)
Captures just the filename of any kind within an entire filepath. Purposefully excludes the file path and the file extension
Etc:
C:\Log\test\bin\fee105d1-5008-410c-be39-883e5e40a33d.pdf
Doesn't capture (C:\Log\test\bin)
Captures (fee105d1-5008-410c-be39-883e5e40a33d)
Doesn't capture (.pdf)
This RegExp works for me:
(.+(?=\..+$))|(.+[^\.])
Results (bold means match):
test.txt
test 234!.something123
.test
.test.txt
test.test2.txt
.