Extract all numeric values in Columns in R - regex

I am currently working with a large data set using R. So, I have a column called "Offers". This column contains text describing 'promotions' that companies offer on their products. I am trying to extract numeric values from these. While, for most cases, I am able to do so well using a combination of regex and functions in R packages, I am unable to deal with a couple of specific cases of text shown below. I would really appreciate any help on these.
"Buying this ensures Savings of $50. Online Credit worth 35$ is also available. So buy soon!"
1a. I want to get both the numeric values out but in 2 different columns. How
do I go about that?
1b. For another problem that I have to solve, I only need to take the value associated with the credit. It is always the case that for texts like above, the second numeric value in the text, if it exists, is the one associated with the credit.
"Get 50% off on your 3 night stay along with 25 credits, offer available on 3 December 2016"
(How should I only take the value associated with credits?)
Note: Efficiency would be important as well because I am dealing with about 14 million rows.
I have tried looking online for a solution but have not found anything very satisfactory.

I am not 100% sure about what you want but this may help you.
A <- "do 50% and whatever 23"
B <- gregexpr("\\d+",A)[[1]]
firstNum <- substr(A,B[1],B[1]+attr(B,"match.length")[1]-1)
secondNum <- substr(A,B[2],B[2]+attr(B,"match.length")[2]-1)
Hope this helps.

Related

How to automatically feed a cell value from a range of values, based on its matching condition with other cell value

I'm making a time-spending tracker based on the work I do every hour of the day.
Now, suppose I have 28 types of work listed in my tracker (which I also have to increase from time to time), and I have about 8 significance values that I have decided to relate to these 28 types of work, predefined.
I want that, as soon as I enter a type of work in cell 1 - I want the adjacent cell 2 to get automatically populated with a significance value (from a range of 8 values) that is pre-definitely set by me.
Every time I input a new or old occurrence of a type of work, the adjacent cell should automatically get matched with its relevant significance value & automatically get populated in real-time.
I know how to do it using IF, IFS, and IF_OR conditions, but I feel that based on the ever-expanding types of work & significance values, the above formulas will be very big, complicated, and repetitive in the future. I feel there's a more efficient way to achieve it. Also, I don't want it to be selected from a drop-down list.
Guys, please help me out with the most efficient way to handle this. TUIA :)
Also, I've added a snapshot and a sample sheet describing the problem.
Sample sheet
XLOOKUP() may work. Try-
=XLOOKUP(D2,A2:A,B2:B)
Or FILTER() function like-
=FILTER(B2:B,A2:A=D2)
You can use this formula for a whole column:
=INDEX(IFERROR(VLOOKUP(C14:C,A2:B9,2,0)))
Adapt the ranges to your actual tables in order to include in the second argument all the potential values and their significances
This is the formula, that worked for me (for anybody's reference):
I created another reference sheet, stating the types of work & their significance. From that sheet, I'm using either vlookup, filter, xlookup.Using gforms for inputting my data.
=ARRAYFORMULA(IFS(ROW(D:D)=1,"Significance",A:A="","",TRUE,VLOOKUP(D:D,Reference!$A:$B,2,0)))

Excel Alternative to nested IF

I have a couple of rather large nested if functions in my spreadsheet. It sure would be nice to have an alternative method. Problem is I'm using a wildcard (*) in my lookup because the source text has slight variations (date for example).
For example, if my list of data contains:
VENMO PAYMENT 220828 1022093447487 BRENDA HOSPY
VENMO PAYMENT 220813 1031323447487 BRENDA HOSPY
I want these to show in an adjacent column of cells as just Venmo
Currently my if function in that second column of cells is:
=IF(COUNTIF($F10,"*APPLE.COM/BILL*"),"AP",
IF(COUNTIF($F10,"IIA VOYA*"),"VOYA",
IF(COUNTIF($F10,"VENMO PAYMENT*"),"Venmo",
IF(COUNTIF($F10,etc...
This works fine but quickly gets unruly as more things get added.
I've spent a great deal of time searching for functions and processes that would make this easier, or at least more compact, but I can't find a way with typical functions like vlookup or index/match.
If I've explained this in a comprehensible fashion perhaps you've seen or experienced a similar situation and could offer a suggestion. It would be appreciated!
I'm not opposed to using a programming function.
I've looked at, and for, various Excel functions or combinations with no luck on my own or online.
I have created a structure as below
Formula present in B2 is as below
=IFERROR(INDEX($F$2:$F$9,MIN(IF(COUNTIF(A2,"*"&$E$2:$E$9&"*")>0,ROW($E$2:$E$9),9999999)-1)),"---")
Enter it as an Array Formula using Ctrl+Shift+Enter
It will search all the strings present in column E in A2 when found will return all the row numbers of column E where there is a match, i have then used min to get the first one, and if not found it will return 9999999, and as the data is starting from row 2 i have added -1 to make it equal to the data index. after that i have called the index to search value present at that index in column F. and at the end used the if error function to show --- where no match was found and 999999 was returned.

Conditional Formatting Depending Upon Multiple Numbers

I have a column of values that are a number out of 10. So, it could be 2/10, 3/10, 4/10 and so on, all the way up to 10/10. To be clear, these are not dates, but simply showing how many questions the student answered correctly out of 10.
I'm trying to use conditional formatting to highlight them a certain color depending upon the score they got. For 9/10 and 10/10, I'm wanting to use a certain color, but it doesn't seem to be working with REGEXMATCH or with OR. Also wanting to highlight all scores that are 6/10 or lower. I know that I could make this work by applying conditional formatting for each and every score with text contains but the problem I'm finding is that it thinks it's a date.
Is there a way to match multiple scores out of 10 using REGEXMATCH?
Link to Sheet
select column and change formatting to Plain text
now you can use formula like:
=REGEXMATCH(A1; "^9|10\/")

Extract list from range Google Sheets

I have some data from workplaces with some different work areas, I need to extract a list for each workplace with their corresponding availables working areas, I have an example of some kind of attempt really close what I wanted. I use this formula but with more data will be long time to do it =IF(D2=$G$1, "Yes", "No"). I want to do it more automatic with some formulas but I don't know where to start.
Give a try on below formula. Put the formula to G1 cell then drag down as needed.
=TRANSPOSE(IFERROR(FILTER($D$2:$D$16,$A$2:$A$16=F2,$D$2:$D$16<>""),""))

How can I resolve INDEX MATCH errors caused by discrepancies in the spelling of names across multiple data sources?

I've set up a Google Sheets workbook that synthesizes data from a few different sources via manual input, IMPORTHTML and IMPORTRANGE. Once the data is populated, I'm using INDEX MATCH to filter and compare the information and to RANK each data set.
Since I have multiple data inputs, I'm running into a persistent issue of names not being written exactly the same between sources, even though they're the same person. First names are the primary culprit (i.e. Mary Lou vs Marylou vs Mary-Lou vs Mary Louise) but some last names with special symbols (umlauts, accents, tildes) are also causing errors. When Sheets can't recognize a match, the INDEX MATCH and RANK functions both break down.
I'm wondering how to better unify the data automatically so my Sheet understands that each occurrence is actually the same person (or "value").
Since you can't edit the results of an IMPORTHTML directly, I've set up "helper columns" and used functions like TRIM and SPLIT to try and fix instances as I go, but it seems like there must be a simpler path.
It feels like IFS could work but I can't figure how to integrate it. Also thinking this may require a script, which I'm just beginning to study.
Here's a simplified example of what I'm trying to achieve and the corresponding errors: Sample Spreadsheet
The first tab is attempting to pull and RANK data from tabs 2 and 3. Sample formulas from the Summary tab, row 3 (Amelia Rose):
Cell B3: =INDEX('Q1 Sales'!B:B, MATCH(A3,'Q1 Sales'!A:A,0))
Cell C3: =RANK(B3,$B$2:B,1)
Cell D3: =INDEX('Q2 Sales'!B:B, MATCH(A3,'Q2 Sales'!A:A,0))
Cell E3: =RANK(D3,$D$2:D,1)
I'd be grateful for any insight on how to best index 'Q2Sales'!B3 as the correct value for 'Summary'!D3. Thanks in advance - the thoughtful answers on Stack Overflow have gotten me this far!
to counter every possible scenario do it like this:
=ARRAYFORMULA(IFERROR(VLOOKUP(LOWER(REGEXREPLACE(A2:A, "-|\s", )),
{REGEXEXTRACT(LOWER(REGEXREPLACE('Q2 Sales'!A2:A, "-|\s", )),
TEXTJOIN("|", 1, LOWER(REGEXREPLACE(A2:A, "-|\s", )))), 'Q2 Sales'!B2:B}, 2, 0)))