Separate text using regex - regex

I have a string like
abcdefangners
and a set of numbers that specifies how to group the above string, such as
3,4
In this case, the output should be
abc,defa,gners
Is something like this possible using regex? I have one option of using a loop to get the comparisons of the set one by one, but is there a better way to do it?

You could do:-
/(.{3})(.{4})(.*)/
This would give you the substrings which you'd then have to join together.
You'd have to create the regexp for each set of numbers so it would not be as easy as other methods of string manipulation.

Related

Regex matching either positive/negative floats, ints or string

I want to be able to match and parse some parameters read from a file such as :
"type:int,register_id:15,value:123456"
"type:int,register_id:16,value:-456789"
"type:double,register_id:17,value:123.456"
"type:double,register_id:18,value:-456.789"
"type:bool,register_id:19,value:true"
"type:bool,register_id:20,value:false"
"type:string,register_id:17,value:Test Set Data Register"
I've come up with the following Regex expression :
(^(type:)\b(bool|int|double|string)\b,(\bregister_id:\b)([1-9][0-9]),(\bvalue:\b)(.)$)
but I have issues where there are negative floats or ints, I can't get the hyphen sorted properly ...
Can someone point me in the right direction ?
https://regex101.com/r/WhXmBE/3
Thanks !
Tried [\s\S] but it reads everything, tried -? as well
Given your example, this seems to work:
(^(type:)(bool|int|double|string),(register_id:)([1-9][0-9]*),(value:)(.*)$)
At least from the example, I didn't see why the \b are necessary. Apologies if I missed something.
Looking at what you try to achieve, I would actually consider moving away from regexes, as regexes by themselves add complexity. You will likely have an easier life if you approach it like this:
Split the line by "," to get the key value pairs
Split each key value pair by the first ":" to split key and value
Validate that all keys are present and that every value matches the format for the key (e.g. if the type is bool then the value should parse to a bool)
You can easily adjust every step to e.g. trim whitespaces.
Edit: Fixed typo

Starting position for replace function in db2

I'm converting some Access VBA functionality to DB2 and found a vital difference. VBA lets you specify the starting point in the character string you're working on. DB2 doesn't have that option. It starts from position 1 and replaces whatever you want to be replaced in the whole string. How can I make DB2 start the replace at a specified place in the string? For example, my string is "Incongruent Plastics Incorporated" and I want to replace the second "Inc" at position 22 with "Inc". I'm doing this in a WHILE loop, going through long strings, replacing parts of them until they are less than a specified maximum (15 or 30 depending on the field).
I looked at the Locate function, but I'm not sure that's right.
Replace(a.PAYEE_STD_NAME, B.FullWord, B.abbreviation, B.mLastWord)
Where a.PAYEE_STD_NAME is the string I'm looking at, B.FullWord is what I want to replace, B.abbreviation is what I want to replace it with, and B.mLastWord is the position where I want to start replacing. Something like Replace("Incongruent Plastics Incorporated","Incorporated","Inc",22)
I expect the characters to be replaced starting in the position I need, towards the back of the string, not in the beginning.
Thanks!
Not that good at DB2, but that limitation can generally be worked around by using SUBSTR
The equivalent of Replace(a.PAYEE_STD_NAME, B.FullWord, B.abbreviation, B.mLastWord) would be:
CONCAT(SUBSTR(a.PAYEE_STD_NAME, 1, B.mLastWord - 1), Replace(SUBSTR(a.PAYEE_STD_NAME, b.mLastWord), B.FullWord, B.abbreviation))
This assumes b.mLastWord is greater than 1, if it's 1 you can use a normal REPLACE.
Maybe consider using REGEXP_REPLACE https://www.ibm.com/support/knowledgecenter/en/SSEPGG_11.1.0/com.ibm.db2.luw.sql.ref.doc/doc/r0061496.html
and possibly consider recusrive SQL rather than looping logic

Excel- Extract Number from Cell

I have multiple cells that I am attempting to extract a number from, and need help finding a regex alternative.
The cells range in the following formats:
asdfs. Seat#29 asfddsa
asdfsa. Seat#5d
asdfasN/A . Seat#22 as789fsd
Seat#111 words33
The closest that I came to a solution is:
=IFERROR(TRIM(MID([#DisplayName],FIND("#",[#DisplayName])+1,3)),"")
As you can see this will extract most of the numbers but for some it leaves a character at the end.
The only commonality is the # preceding the seat number. I am trying to extract only the seat number, no other numbers.
I cannot use VBA, this must be done using formulas. I have figured this out once before but stupidly pasted over the formulas with a values only paste.
This can be done utilizing a flash fill, but I was hoping for a more stable formula.
If you want just the numbers then use:
=--MID(A1,FIND("#",A1)+1,AGGREGATE(15,6,ROW(1:5)/(ISERROR(--MID(REPLACE(A1,1,FIND("#",A1),""),ROW(1:5),1))),1)-1)
If you want the letter also then:
=MID(A1,FIND("#",A1)+1,FIND(" ",REPLACE(A1,1,FIND("#",A1),""))-1)
If you do not need the letter following the seat number, you can use
.*#(\d+)
Edit for clarity: Excel does not have regex functions built in. You will either have to use a UDF (I can help with that if you'd like) or use a non-regex solution.
Here is a solution without VBA to extract all numbers inside the strings.
https://drive.google.com/open?id=1Fk6VFznD3i8s6scADy_vXCEj-1zQpBPW
Sheet #3

Possible combination (variations) of words in a string variable in stata

I have a string variable containing school names and I need to find all the possible combination of each word in this string variable in stata:
For example variation of a word "Academy" would be:
Academy,
Academy,
acdamey,
aacdemy,
dmcaamy,
aacedmy,
and so on.
I need this to standardize the raw data of school names, which has many typos of each word due to data entry issues, like the ones given above for "academy".
Depending whether your data is already in the Excel sheets or a file, you can either use regex trying to match all possible combinations (and probably fix them when found) or parse the strings first before bringing them into Excel. In either case you could make a file (or Excel list/table/area/etc.) that includes all the common typos and pick each typo as regex match to use when comparing to your actual input.
Making regexp that would actually find all possible cases is next to impossible, especially if there are cases where very similar (but correct) names for schools exist. In any case direct regexps would be very messy and complex, so I would advice you to parse the data by finding first the correct form, excluding it and then using (greedy) search/regex to find the typoed versions. You can then save the typos to use them as a filter/match/pattern.
To get some sort of starting ideas, check this links:
Regex: Search for verb roots
Read text file and extract string into Excel sheet using regex
P.s You should keep the count of all strings/school names and finally get a list of all names that did not match correct form or any of your regexp filters, so you can manually insert/correct them.

Classic ASP getting value after every comma with regex from a string

Fairly new to classic ASP(maintaining legacy applications) and I need to figure out how to fish out values from a string. String itself can look something like this - 0,12,234,543. I was thinking about making a function where I can specify which number I want from the string for ex.
Function fnGetNumber(string, 3)
// returns the third number(number after the second comma ie. 234)
End Function
The string will always have only numbers and always 4 of them. Also they will not have decimal places.
The function itself is not a problem, but I can't figure out the regex.
Using Split is the way, use it to transform the string to an array and pull the data from that.
I wouldn't recommend regex as whilst it powerful, its readability is poor.