Regex to get text between 2 large spaces - regex

I want to try and regex this text to only get "Second Baptist School" as the output by using Customer: as the set beginning for it to recognize. How would I get it so that it recognizes the beginning and gets all of the text in between the large sections of blanks?
Customer: Second Baptist School Date of Sale: 9/26/2022
Right now I'm using Customer:\s*([^ -.]+) but it only gets "Second" as the output.

You can look for 2 or more white spaces with:
Customer:\s*(.*?)\s{2,}
this should align with your above examples. The {2,} says 2 or more.
https://regex101.com/r/1HapOO/1

Related

Regex for values that are in between spaces

I am new to regex and having difficulty obtaining values that are caught in between spaces.
I am trying to get the values "field 1" "abc/def try" from the sameple data below just using regex
Currently im using (^.{18}\s+) to skip the first 18 characters, but am at at loss of how to do grab values with spaces between.
A1234567890 field 1 abc/def try
02021051812 12 test test 12 pass
3333G132021 no test test cancel
any help/pointers will be appreciated.
If this text has fixed-width columns, you can match and trim the column values knowing the amount of chars between start of string and the column text.
For example, this regex will work for the text you posted:
^(.*?)\s*(?<=.{19})(.*?)\s*(?<=^.{34})(.*?)\s*(?<=^.{46})
See the regex demo.
So, Column 2 starts at Position 19, Column 3 starts at Position 34 and Column 4 (end of string here) is at Position 46.
However, this regex is not that efficient, and it would be really great if the data format is fixed on the provider's side.
Given the not knowing if the data is always the same length I created the following, which will provide you with a group per column you might want to use:
^((\s{0,1}\S{1,})*)(\s{2,})((\s{0,1}\S{1,})*)(\s{2,})((\s{0,1}\S{1,})*)
Regex demo

Regex - Alteryx - Parse - How to find an expression starting by the end of the string

I need to parse the following expression:
Fertilizer abc 7-15-15 5KG BOX 250 KG
in 3 fields:
The product description: Fertilizer abc 7-15-15
Size: 250
Size unit: KG
Do not know how to proceed. Please, any help and explanation?
Try this in the alteryx REGEX Tool with Parse selected as the Method:
([A-z ]* [\d-]{6,8}) ([A-Z\d]{2,6}) (.{1,5}?) (\d*) ([A-Z]*)
You can test it at Regexpal to see the breakdown of each group but essentially the first set of brackets will get you your product description (text and spaces until 6-8 characters made up of digits and dashes), the 2nd & 3rd parts will deal with the erroneous info that you don't want, the 4th group will be just digits and the 5th group will be any text afterwards.
Note that this will change dramatically if your data has digits where there is characters currently etc.
You can always break it up into even smaller groups and then concatenate back together as well.

SQLite: How to split a column

I have a column containing two names, which I'd like to extract into two separate columns surname1 and surname2 (I don't need the name nor the initial letter (e.g. N.)).
The exemplary content of that column is:
AwyeEaef2012 MS101 N.Lopez-O.Lorenzi.txt
-Lopez and Lorenzi are these two which we are looking for in this row.
What is good about my situation is that the first name comes always after the first dot (.) and ends just before the dash (-) and the second name comes just after second dot and ends just before the third dot and txt (.txt).
I know how to write a regex and using LIKE check if that column contains some specific surname but not the opposite way- how to read surnames and write them into two new columns.
Several rows from that column look like below:
WyeEaef MN2014 MS401 N.Lopez-O.Lorenzi.txt
AwyufEQ WCH2014 OS401 N.Lorenzi-O.Lopez.txt
THAFa5u WCH2014 LS107 N.Larry-O.Lolly.txt
So the pattern is as I mentioned *.Name1-[A-Z].Name2.txt
Where * is max 30 characters of capital and small letters and numbers
It could be approached in this manner: other words we need to divide this into substrings divided by dots first substring is a waste, the second without two last characters(a dash and acapital letter, e.g. -O) is the first name, the third substring is the second name and the fourth is another waste(a former file format).
I'd like to have an output of three columns:
initialColumn, firstName, secondName
The workaround that I wrote as a formula in Excel which I personally don't love, but might be useful for someone in the future.
=MID(A1;FIND(".";A1;1)+1;FIND(".";A1;FIND(".";A1;1)+1)-FIND(".";A1;1)-3)
I was surprised that Excel can manage processing ~0.5mln of records in the blink of an eye.

Regex selecting the last 6 numbers of

I am a noob at regex and i've been trying to select 6 numbers from within a file and then replace those 6 numbers with the same numbers plus , new line (making a CSV obviously).
Anyway sample data is simply nonsense like this:
fafksadjlkgtjafglkj210000adsfaklgjadklgjag3600001skfjaklaj093i393593390000002sadfljafkjgakjgasafksadjlkgtjafglkj£94.00 489438adsfaklgjadklgjag7700001skfjaklaj093i393593390000002ssafksa djlkgtjafglkj000000adsfaklgjadklgjag0000001skfj aklaj093i393593£39.00900002ssafksadjlk gtjafglkj000000adsfaklgjadklgjag0000001skfjaklaj093i3935£933.90000002s
Note some of the numbers are attached to currency values as well (and some are next to it but contain a space before hand) but the end will always be 6 numbers (consider them to be random as I can't see a pattern).
So I basically need to select strings matching numerics that are six digits long or longer, if longer then it just uses the last 6 digits.
Then I will replace it with itself and a comma and new line.
I hope that makes sense, i've tried a few things without success..
Thanks, edit the closest I have is:
(\d)\d{6}(?!\d)
In the Find what: text field, type in (\d{6})(\D). In the Replace with: text field, type in $1\r\n$2. Make sure that the regular expression radio button is selected. For your input, that should yield this:
fafksadjlkgtjafglkj210000
adsfaklgjadklgjag3600001
skfjaklaj093i393593390000002
sadfljafkjgakjgasafksadjlkgtjafglkj£94.00 489438
adsfaklgjadklgjag7700001
skfjaklaj093i393593390000002
ssafksa djlkgtjafglkj000000
adsfaklgjadklgjag0000001
skfj aklaj093i393593
£39.00900002
ssafksadjlk gtjafglkj000000
adsfaklgjadklgjag0000001
skfjaklaj093i3935£933.90000002
s
You want
\d{6}(?=\D*$)
Read more about anchors here.
i've been trying to select 6 numbers from within a file and then replace those 6 numbers with the same numbers plus , new line
So you're basically trying to do this, right?:
Find:
(\d{6})(\D)
Replace:
\1\n\2
[Online example]
How about:
Find what: (\d{6,})(?:\D*)$
Replace with: $1,\n

Regular Expression help needed to convert lst file to csv

I have a file (ratings.lst) downloaded from IMDB Interfaces. The content appears to be in in the following format :-
Distribution Votes Rating Title
0000001222 297339 8.4 Reservoir Dogs (1992)
0000001223 64504 8.4 The Third Man (1949)
0000000115 48173 8.4 Jodaeiye Nader az Simin (2011)
0000001232 324564 8.4 The Prestige (2006)
0000001222 301527 8.4 The Green Mile (1999)
My aim is to convert this file into a CSV file (comma separated) with the following desired result (example for 1 line) :
Distribution Votes Rating Title
0000001222, 301527, 8.4, The Green Mile (1999)
I am using textpad and it supports regex based search and replace. I'm not sure what type of regex is needed to achieve the above desired results. Can somebody please help me on this. Thanks in advance.
The other regular expressions are somewhat overcomplicated. Because whitespace is guaranteed not to appear in the first three columns, you don't have to do a fancy match - "three columns of anything separated by whitepace" will do.
Try replacing ^(.+?)\s+(.+?)\s+(.+?)\s+(.+?)$ with \1,\2,\3,"\4" giving the following output (using Notepad++)
Distribution,Votes,Rating,"Title"
0000001222,297339,8.4,"Reservoir Dogs (1992)"
0000001223,64504,8.4,"The Third Man (1949)"
0000000115,48173,8.4,"Jodaeiye Nader az Simin (2011)"
0000001232,324564,8.4,"The Prestige (2006)"
0000001222,301527,8.4,"The Green Mile (1999)"
Note the use of a non-greedy quantifier, .+?, to prevent accidentally matching more than we should. Also note that I've enclosed the fourth column with quote marks "" in case a comma appears in the movie title - otherwise the software you use to read the file would interpret Avatar, the Last Airbender as two columns.
The nice tabular alignment is gone - but if you open the file in Excel it will look fine.
Alternately, just do the entire thing in Excel.
First replace all " with "" then do this:
Find: ^\([0-9]+\)[ \t]+\([0-9]+\)[ \t]+\([^ \t]+\)[ \t]+\(.*\)
Replace with: \1,\2,\3,"\4"
Press F8 to open Replace dialog
Make sure Regular Expression is selected
In Find what: put: ^([[:digit:]]{10})[[:space:]]+([[:digit:]]+)[[:space:]]+([[:digit:]]- {1,2}\.[[:digit:]])[[:space:]]+(.*)$
In Replace with: put \1,\2,\3,"\4"
Click Replace All
Note: This uses 1 or more spaces between fields from ratings.lst - you might be better off specifying the exact number of spaces if you know it.
Also Note: I didn't put spaces between the comma seperated items, as generally you don't, but feel free to add those in
Final Note: I put the movie title in quotes, so that if it contains a comma it doesn't break the CSV format. You may want to handle this differently.
MY BAD This is a C# program. I will leave it up for an alternate solution.
The ignorepattern whitespace is for commenting the pattern.
This will create data which can be placed into a CSV file. Note CSV files do not have optional whitepsace in them as per your example....
string data =#"Distribution Votes Rating Title
0000001222 297339 8.4 Reservoir Dogs (1992)
0000001223 64504 8.4 The Third Man (1949)
0000000115 48173 8.4 Jodaeiye Nader az Simin (2011)
0000001232 324564 8.4 The Prestige (2006)
0000001222 301527 8.4 The Green Mile (1999)
";
string pattern = #"
^ # Always start at the Beginning of line
( # Grouping
(?<Value>[^\s]+) # Place all text into Value named capture
(?:\s+) # Match but don't capture 1 to many spaces
){3} # 3 groups of data
(?<Value>[^\n\r]+) # Append final to value named capture group of the match
";
var result = Regex.Matches(data, pattern, RegexOptions.Multiline | RegexOptions.IgnorePatternWhitespace)
.OfType<Match>()
.Select (mt => string.Join(",", mt.Groups["Value"].Captures
.OfType<Capture>()
.Select (c => c.Value))
);
Console.WriteLine (result);
/* output
Distribution,Votes,Rating,Title
0000001222,297339,8.4,Reservoir Dogs (1992)
0000001223,64504,8.4,The Third Man (1949)
0000000115,48173,8.4,Jodaeiye Nader az Simin (2011)
0000001232,324564,8.4,The Prestige (2006)
0000001222,301527,8.4,The Green Mile (1999)
*/