Excel VBA help - run a series of regex find and replaces - regex

I have a worksheet that has become very complex. On it, there is a sheet in which a user will paste data about once every other day. The data will always be in the same format, and is provided to us in an exact way only. Once pasted in, I need a way for a very average user of excel to be able to press a button (or key combo, or whatever) and excel will run a series of about 8-10 regex find and replaces. All of these will be on column A of the data. Once those are all run, a simple formula would be run on every cell C2 and below in column C. Those columns should be reduced by 80% - =C2*.8
This should all be done with minimal user input if possible.
Would anybody much more versed in regex or excel know a better direction for me to look for a proper start? What resources would be recommended to best accomplish this?

If you're multiplying by some factor, then regexp substitution will be overkill. Excel is very good at multiplying an array of numbers by 0.8.
Search for "Excel paste factor" and you'll get an easy explanation, such as this one.
I might record a macro for your less-experienced users and hope that the previous user pasted the numbers in with absolute perfection.

Related

Replicating conditional formatting Google Sheets

I am trying to create a simple table that I can just replicate over and over when needed. Although in my sheet, I have the first range, B3:D12 working exactly as I want, I am finding it a challenge to then copy the formatting across to E3:G12 and for it to work subsequently.
Is the formula wrong? Is there an easier way that I can do this to make it simple each time I copy + paste the table across?
Thanks
Google Sheet Conditional Formatting
apply this:
=(B3=D3)*(B3<>"")
to B3:12, C3:12 and D3:12 as green
then as red use:
=(B3<>D3)*(B3<>"")
on B3:12

Conditional Formatting Depending Upon Multiple Numbers

I have a column of values that are a number out of 10. So, it could be 2/10, 3/10, 4/10 and so on, all the way up to 10/10. To be clear, these are not dates, but simply showing how many questions the student answered correctly out of 10.
I'm trying to use conditional formatting to highlight them a certain color depending upon the score they got. For 9/10 and 10/10, I'm wanting to use a certain color, but it doesn't seem to be working with REGEXMATCH or with OR. Also wanting to highlight all scores that are 6/10 or lower. I know that I could make this work by applying conditional formatting for each and every score with text contains but the problem I'm finding is that it thinks it's a date.
Is there a way to match multiple scores out of 10 using REGEXMATCH?
Link to Sheet
select column and change formatting to Plain text
now you can use formula like:
=REGEXMATCH(A1; "^9|10\/")

Extract list from range Google Sheets

I have some data from workplaces with some different work areas, I need to extract a list for each workplace with their corresponding availables working areas, I have an example of some kind of attempt really close what I wanted. I use this formula but with more data will be long time to do it =IF(D2=$G$1, "Yes", "No"). I want to do it more automatic with some formulas but I don't know where to start.
Give a try on below formula. Put the formula to G1 cell then drag down as needed.
=TRANSPOSE(IFERROR(FILTER($D$2:$D$16,$A$2:$A$16=F2,$D$2:$D$16<>""),""))

AWS GroundTruth text labeling - hide columns in the data, and checking quality of answers

I am new to SageMaker. I have a large csv dataset which I would like labelled:
sentence_id
sentence
pre_agreed_label
148392
A sentence
0
383294
Another sentence
1
For each sentence, I would like a) a yes/no binary classification in response to a question, and b) on a scale of 1-3, how obvious the classification was. I need the sentence id to map to other parts of the dataset, and will use the pre-agreed labels to assess accuracy.
I have identified SageMaker GroundTruth labelling jobs as a possible way to do this. Is this the best way? In trying to set it up I have run into a few problems.
The first problem is I can't find a way to display only the sentence column to the labellers, hiding the sentence_id and pre_agreed_labels.
The second is that there is either single labelling or multi labelling, but I would like a way to have two sets of single-selection labels:
Select one for binary classification:
Yes
No
Select one for difficulty of classification:
Easy
Medium
Hard
It seems as though this can be done using custom HTML, but I don't know how to do this - the template it gives you doesn't even render
Finally, having not used mechanical turk before, are there ways of ensuring people take the work seriously and don't just select random answers? I can see there's an option to have x number of people answer the same question, but is there also a way to put in an obvious question to which we already have a 'pre_agreed_label' every nth question, and kick people off the task if they get it wrong? There also appears to be a maximum of $1.20 per task which seems odd.

Comparing two documents

I have two very large lists. They both were originally in excel, but the larger one is a list of emails (about 160,000) of them with other information like their name and address etc. And the smaller one is a list of just 18,000 emails.
My question is what would be the easiest way to get rid of all 18,000 rows from the first document that contain the email addresses from the second?
I was thinking regex or maybe there is another application I can use? I have tried searching online but it seems like there isn't much specific to this. I also tried notepad++ but it freezes when I try to compare these large files.
-Thank You in Advance!!
Good question. One way I would tackle this is making a C++ program [you could extrapolate the idea to the language of your choice; You never mentioned which languages you were proficient in] that read each item of the smaller file into a vector of strings. First, of course, use Excel to save the files as CSV instead of XLS or XLSX, which will comma-separate the values so you can work with them easier. For the larger list, "Save As" a copy of just email addresses, deleting the other rows for now.
Then, you could open the larger list and use a nested loop to check if you should output to an output file. Something like:
bool foundMatch=false;
for(int y=0;y<LargeListVector.size();y++) {
for(int x=0;x<SmallListVector.size();x++) {
if(SmallListVector[x]==LargeListVector[y]) foundMatch=true;
}
if(!foundMatch) OutputVector.append(LargeListVector[y]);
foundMatch=false;
}
That might be partially pseudo-code, but do you get the idea?
So I read a forum post at : Here
=MATCH(B1,$A$1:$A$3,0)>0
Column B would be the large list, with the 160,000 inputs and column A was my list of things I needed to delete of 18,000.
I used this to match everything, and in a separate column pasted this formula. It would print out either an error or TRUE. If the data was in both columns it printed out true.
Then because I suck with excel, I threw this text into Notepad++ and searched for all lines that contained TRUE (match case, because in my case some of the data had the word true in it without caps.) I marked those lines, then under search, bookmarks, I removed all lines with bookmarks. Pasted that back into excel and voila.
I would like to thank you guys for helping and pointing me in the right direction :)