Getting the value of a number cell with VBA in Excel - regex

So I have section columns which look like this:
8.01
8.02
8.03
8.04
8.05
8.06
8.07
8.08
8.09
8.10
And so on and so forth. I have it set up so that it will always show the trailing zeroes (8.10 and 8.20) but when I use VBA to get the value of the cell it still shows 8.1 and 8.2. I'm using
x = Cells.Value
but it won't work the way I need it to. I have to keep the column as number for sorting and other reasons so changing the type isn't really an option I don't think. How do I assign a number with a trailing zero to a variable in VBA? Do I need to run a test case with REGEX or something?

As follow up from comments, this one works:
x = Cells(1,1).Text
or
x = Format(Cells(1,1).Value,"0.00")
you can change "0.00" to any other format you want, e.g.
x = Format(Cells(1,1).Value, "Standard")
code above supposed that x has String type

Related

Matlab: What's the most efficient approach to parse a large table or cell array with regexp when sometimes there is no match?

I am working with a messy manually maintained "database" that has a column containing a string with name,value pairs. I am trying to parse the entire column with regexp to pull out the values. The column is huge (>100,000 entries). As a proxy for my actual data, let's use this code:
line1={'''thing1'': ''-583'', ''thing2'': ''245'', ''thing3'': ''246'', ''morestuff'':, '''''};
line2={'''thing1'': ''617'', ''thing2'': ''239'', ''morestuff'':, '''''};
line3={'''thing1'': ''unexpected_string(with)parens5'', ''thing2'': 245, ''thing3'':''246'', ''morestuff'':, '''''};
mycell=vertcat(line1,line2,line3);
This captures the general issues encountered in the database. I want to extract what thing1, thing2, and thing3 are in each line using cellfun to output a scalar cell array. They should normally be 3 digit numbers, but sometimes they have an unexpected form. Sometimes thing3 is completely missing, without the name even showing up in the line. Sometimes there are minor formatting inconsistencies, like single quotes missing around the value, spaces missing, or dashes showing up in front of the three digit value. I have managed to handle all of these, except for the case where thing3 is completely missing.
My general approach has been to use expressions like this:
expr1='(?<=thing1''):\s?''?-?([\w\d().]*?)''?,';
expr2='(?<=thing2''):\s?''?-?([\w\d().]*?)''?,';
expr3='(?<=thing3''):\s?''?-?([\w\d().]*?)''?,';
This looks behind for thingX' and then tries to match : followed by zero or one spaces, followed by 0 or 1 single quote, followed by zero or one dash, followed by any combination of letters, numbers, parentheses, or periods (this is defined as the token), using a lazy match, until zero or one single quote is encountered, followed by a comma. I call regexp as regexp(___,'tokens','once') to return the matching token.
The problem is that when there is no match, regexp returns an empty array. This prevents me from using, say,
out=cellfun(#(x) regexp(x,expr3,'tokens','once'),mycell);
unless I call it with 'UniformOutput',false. The problem with that is twofold. First, I need to then manually find the rows where there was no match. For example, I can do this:
emptyout=cellfun(#(x) isempty(x),out);
emptyID=find(emptyout);
backfill=cell(length(emptyID),1);
[backfill{:}]=deal('Unknown');
out(emptyID)=backfill;
In this example, emptyID has a length of 1 so this code is overkill. But I believe this is the correct way to generalize for when it is longer. This code will change every empty cell array in out with the string Unknown. But this leads to the second problem. I've now got a 'messy' cell array of non-scalar values. I cannot, for example, check unique(out) as a result.
Pardon the long-windedness but I wanted to give a clear example of the problem. Now my actual question is in a few parts:
Is there a way to accomplish what I'm trying to do without using 'UniformOutput',false? For example, is there a way to have regexp pass a custom string if there is no match (e.g. pass 'Unknown' if there is no match)? I can think of one 'cheat', which would be to use the | operator in the expression, and if the first token is not matched, look for something that is ALWAYS found. I would then still need to double back through the output and change every instance of that result to 'Unknown'.
If I take the 'UniformOutput',false approach, how can I recover a scalar cell array at the end to easily manipulate it (e.g. pass it through unique)? I will admit I'm not 100% clear on scalar vs nonscalar cell arrays.
If there is some overall different approach that I'm not thinking of, I'm also open to it.
Tangential to the main question, I also tried using a single expression to run regexp using 3 tokens to pull out the values of thing1, thing2, and thing3 in one pass. This seems to require 'UniformOutput',false even when there are no empty results from regexp. I'm not sure how to get a scalar cell array using this approach (e.g. an Nx1 cell array where each cell is a 3x1 cell).
At the end of the day, I want to build a table using these results:
mytable=table(out1,out2,out3);
Edit: Using celldisp sheds some light on the problem:
celldisp(out)
out{1}{1} =
246
out{2} =
Unknown
out{3}{1} =
246
I assume that I need to change the structure of out so that the contents of out{1}{1} and out{3}{1} are instead just out{1} and out{3}. But I'm not sure how to accomplish this if I need 'UniformOutput',false.
Note: I've not used MATLAB and this doesn't answer the "efficient" aspect, but...
How about forcing there to always be a match?
Just thinking about you really wanting a match to skip this problem, how about an empty match?
Looking on the MATLAB help page here I can see a 'emptymatch' option, perhaps this is something to try.
E.g.
the_thing_i_want_to_find|
Match "the_thing_i_want_to_find" or an empty match, note the | character.
In capture group it might look like this:
(the_thing_i_want_to_find|)
As a workaround, I have found that using regexprep can be used to find entries where thing3 is missing. For example:
replace='$1 ''thing3'': ''Unknown'', ''morestuff''';
missingexpr='(?<=thing2'':\s?)(''?-?[\w\d().]*?''?,) ''morestuff''';
regexprep(mycell{2},missingexpr,replace)
ans =
''thing1': '617', 'thing2': '239', 'thing3': 'Unknown', 'morestuff':, '''
Applying it to the entire array:
fixedcell=cellfun(#(x) regexprep(x,missingexpr,replace),mycell);
out=cellfun(#(x) regexp(x,expr3,'tokens','once'),fixedcell,'UniformOutput',false);
This feels a little roundabout, but it works.
cellfun can be replaced with a plain old for loop. Your code will either be equally fast, or maybe even faster. cellfun is implemented with a loop anyway, there is no advantage of using it other than fewer lines of code. In your explicit loop, you can then check the output of regexp, and build your output array any way you like.

REGEXMATCH and MATCH don't work when a cell contains a number

I am trying to use formulas to find a row in my google spreadsheet document, however I have got a weird problem.
I am not able to find values when a cell contains a number (without any other characters).
Consider the following case
I have got two values
A1 - 32323232323
A2 - 323-23232-323
When I use the following formula
=FILTER(A:E,REGEXMATCH(B:B,"323-23232-323"))
It works fine, it successfully finds A2 value, however when I try to use the following formula
=FILTER(A:E,REGEXMATCH(B:B,"32323232323"))
It doesn't match any row, and I also tried the following formula
ADDRESS(MATCH("32323232323",B:B,0),1)
It doesn't work either, it only works when I remove quotes like that
ADDRESS(MATCH(32323232323,B:B,0),1)
But this doesn't work with REGEXMATCH.
Is there any way I can match numbers using a regex expression (exact number, without wildcards) ?
Thanks
=FILTER(A:A,REGEXMATCH(REGEXREPLACE(TO_TEXT(A:A),"-",""), "32323232323"))
to get both 323-23232-323 and 32323232323.
=FILTER(A:E,REGEXMATCH(TO_TEXT(B:B),"32323232323"))
to get number 32323232323.
Notes:
Converting to_text is a key here.
Change columns to yours.

Converting x//y into x/y" Excel

I'm looking for a solution to convert a text stored in an Excel cell from,
x//y
into
x/y"
(preferably using serach and replace if possible)
Values of x & y will keep changing from cell to cell however pattern will be the same.
I am using Excel 2007. A VBA solution if any will be fine also.
Try =CONCATENATE(SUBSTITUTE(C2,"//","/",1),"""")
Something like above works?
No VBA Required. Try this Excel formula
=MID(A1,1,FIND("//",A1,1)-1)&"/"&MID(A1,FIND("//",A1,1)+2,LEN(A1)-FIND("//",A1,1)-1)&""""
Explanation
=MID(A1,1,FIND("//",A1,1)-1) will give you X from X//Y
=MID(A1,FIND("//",A1,1)+2,LEN(A1)-FIND("//",A1,1)-1) will give you Y
Use next pattern of RegEx for replacing //
\/\/(\w+)
with /$1"

Can someone provide a regex for validating and parsing a csv of integers and reals

I am new to regex and struggling to create an expression to parse a csv containing 1 to n values. The values can be integers or real numbers. The sample inputs would be:
1
1,2,3,4,5
1,2.456, 3.08, 0.5, 7
This would be used in c#.
Thanks,
Jerry
Use a CSV parser instead of RegEx.
There are several options - see this SO questions and answers and this one for the different options (built into the BCL and third party libraries).
The BCL provides the TextFieldParser (within the VisualBasic namespace, but don't let that put you off it).
A third party library that is liked by many is filehelpers.
Using REGEX for CSV parsing has been a 10 year jihad for me. I have found it remarkably frustrating, due to the boundary cases:
Numbers come in a variety of forms (here in the US, Canada):
1
1.
1.0
1000
1000.
1,000
1e3
1.0e3
1.0e+3
1.0e+003
-1
-1.0 (etc)
But of course, Europe has traditionally been different with regard to commas and decimal points:
1
1,0
1000
1.000e3
1e3
1,0e3
1,0e+3
1,0e+003
Which just ruins everything. So, we ignore the German and French and Continental standard because the comma just is impossible to work out whether it is separating values, or part of values. (The Continent likes TAB instead of COMMA)
I'll assume that you're "just" looking for numerical values separated from each other by commas and possible space-padding. The expression:
\s*(\-?\d+(?:\.\d*)?(?:[eE][\-+]?\d*)?)\s*
is a pretty fair parser of A NUMBER. Catches just about every reasonable case. Doesn't deal with imbedded commas though! It also trims off spaces, either side of the number.
From there, you can either build an iterative CSV string decomposer (walking each field, absorbing commas, assigning to an array, say), or use the scanf type function to do the same thing. I do prefer the iterative decomposition method - as it also allows you to parse out strings, hexadecimal, and virtually any other pattern you find in the data.
The regex you want is
#"([+-]?\d+(?:\.\d+)?)(?:$|,\s*)"
...from which you'll want capture group 1. However, don't use regex for something like this. String manipulation is much better when the input is in a very static, predictable format:
string[] nums = strInput.split(", ".ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
List<float> results = (from n in nums
select float.Parse(n)).ToList();
If you do use regex, make sure you do a global capture.
I think you would have to loop it to check for an unknown number of ints... or else something like this:
/ *([0-9.]*) *,? *([0-9.]*) *,? *([0-9.]*) *,? *([0-9.]*) *,? *([0-9.]*) */
and you could keep that going ",?([0-9]*)" as far as you wanted to, to account for a lot of numbers. The result would be an array of numbers....
http://jsfiddle.net/8URvL/1/

Replacing Credit Card Numbers

I'm using an sql to replace credit card numbers with xxxx and finding that REGEX_REPLACE does not consistently replace everything. Below is the SET command i'm using on the SQL
SET COMMENTS_LONG =
REGEXP_REPLACE (COMMENTS_LONG,'\D[1-6]\d{3}.\d{4}.\d{4}.\d{3}(\d{1}.\d{3})?|\D[1-6]\d{12,15}|\D[1-6]\d{3}.\d{3}.?\d{3}.\d{5}', ' XXXXXXXXXXXXXXXX')
Before
Elizabeth aclled to change address.5430-6000-2111-1931 A
After
Elizabeth aclled to change address XXXXXXXXXXXXXXXX1 A
I tried increasing the number of X but result is the same. I also find that i have to put a space in front of the first X as it appears to move 1 char to the left.
I would't make the regex to specific, that increases the change of accidentially letting real numbers that don't match your expression pass to the end user.
I would just use a simple regex like this:
(\d+-){3}\d+
btw: why did you include \D at the beginning? The . is not part of the creditcard number, right?
Edit: Just found this regex
\b(?:\d[ -]*?){13,16}\b
at this site: http://www.regular-expressions.info/creditcard.html
You should read the paragraph "Finding Credit Card Numbers in Documents"