Excel Sum values by extracting numbers from single multi-line cell - regex

AL-CHE-P1-1518 --- 270
AL-CHE-P2-1318 --- 280
AL-MAT-P1-1218 --- 280
AL-MAT-P4-0918 --- 40
all these data are inside same cell C2, my aim is to derive a formula to sum
270+280+280+40
in cell D2
tried regextract(c2,"\d(.*)\n") but only the first "270" is extracted, I need help, searched through all forums, couldn't get exact match, it will save me huge time if anyone could give me some hint on how to derive the sum inside same cell string

As far as I know, you can only accomplish this via a UDF:
Function ReturnSum(rng As Range) As Long
Dim arr As Variant
arr = Split(rng.Value, Chr(10) & Chr(10))
For i = 0 To UBound(arr)
ReturnSum = ReturnSum + Trim(Split(arr(i), " --- ")(1))
Next i
End Function

=SUMPRODUCT(ARRAYFORMULA(REGEXEXTRACT(SPLIT(C2,CHAR(10))," \d+")))

In Excel the formula is a bit more complicated and an array formula:
=SUM(IFERROR(--MID(TRIM(MID(SUBSTITUTE(A1,CHAR(10),REPT(" ",99)),(ROW($XFD$1:INDEX(XFD:XFD,LEN(A1)-LEN(SUBSTITUTE(A1,CHAR(10),""))+1))-1)*99+1,99)),FIND("---",TRIM(MID(SUBSTITUTE(A1,CHAR(10),REPT(" ",99)),(ROW($XFD$1:INDEX(XFD:XFD,LEN(A1)-LEN(SUBSTITUTE(A1,CHAR(10),""))+1))-1)*99+1,99)))+3,99),0))
Being an array formula it must be confirmed with Ctrl-Shift-Enter instead of Enter when Exiting Edit mode.

Related

A cell containing a range of values and making calculations with that range

Is it possible to have a range of values in a cell so that Sheets understands it when calculating something?
Here's an example of the desired output:
A B C
1 Value Share Total sum
2 100.00 90-110% 90-110
Here, Total sum (C2) = A2 * B2 (so 100 * 90-110%), giving a range of 90-110.
However, I don't know how to insert this range of values into a cell without Sheets saying #VALUE!.
you will need to do it like this:
=REGEXREPLACE((A2*REGEXEXTRACT(B2, "\d+")%)&"-"&
A2*REGEXEXTRACT(B2, "-(\d+%)"), "\.$", )
for decimals:
=REGEXREPLACE((A40*REGEXEXTRACT(B40, "\d+.\d+|\d+")%)&"-"&
A40*REGEXEXTRACT(B40, "-(\d+.\d+%)|-(\d+%)"), "\.$", )

Need help implementing a certain logic that will fill a text to a certain width.

The task is to justify text within a certain width.
user inputs: Hello my name is Harrry. This is a sample text input that nobody
will enter.
output: What text width do you want?
user inputs: 15
output: |Hello my name|
|is Harrry. This|
|is a sample|
|text that|
|nobody will|
|enter. |
Basically, the line has to be 15 spaces wide including blank spaces. Also, if the next word in the line cant fit into 15, it will skip entirely. If there are multiple words in a line, it will try to distribute the spaces evenly between each word. See the line that says "Is a sample" for example.
I created a vector using getline(...) and all that and the entire text is saved in a vector. However, I'm kind of stuck on moving forward. I tried using multiple for loops, but I just cant seem to skip lines or even out the spacing at all.
Again, not looking or expecting anyone to solve this, but I'd appreciate it if you could guide me into the right direction in terms of logic/algorithm i should think about.
You should consider this Dynamic programming solution.
Split text into “good” lines
Since we don't know where we need to break the line for good justification, we start guessing where the break to be done to the paragraph. (That is we guess to determine whether we should break between two words and make the second word as start of the next line).
You notice something? We brutefore!
And note that if we can't find a word small enought to fit in the remaining space in the current line, we insert spaces inbetween the words in the current line. So, the space in the current line depends on the words that might go into the next or previous line. That's Dependency!
You are bruteforcing and you have dependency,there comes the DP!
Now lets define a state to identify the position on our path to solve this problem.
State: [i : j] ,which denotes line of words from ith word to jth word in the original sequence of words given as input.
Now, that you have state for the problem let us try to define how these states are related.
Since all our sub-problem states are just a pile of words, we can't just compare the words in each state and determine which one is better. Here better delineates to the use of line's width to hold maximum character and minimum spaces between the words in the particular line. So, we define a parameter, that would measure the goodness of the list of words from ith to jth words to make a line. (recall our definition of subproblem state). This is basically evaluating each of our subproblem state.
A simple comparison factor would be :
Define badness(i, j) for line of words[i : j].
For example,
Infinity if total length > page width,
else (page width − total length of words in current line)3
To make things even simple consider only suffix of the given text and apply this algorithm. This would reduce the DP table size from N*N to N.
So, For finishing lets make it clear what we want in DP terms,
subproblem = min. badness for suffix words[i :]
=⇒ No.of subproblems = Θ(n) where n = no of words
guessing = where to end first line, say i : j
=⇒ no. of choices for j = n − i = O(n)
recurrence relation between the subproblem:
• DP[i] = min(badness (i, j) + DP[j] for j in range (i + 1, n + 1))
• DP[n] = 0
=⇒ time per subproblem = Θ(n)
so, total time = Θ(n^2).
Also, I'll leave it to you how insert spaces between words after determining the words in each line.
Logic would be:
1) Put words in array
2) Loop though array of words
3) Count the number of chars in each word, and check until they are the text width or less (skip if more than textwidth). Remember the number of words that make up the total before going over 15 (example remember it took 3 words to get 9 characters, leaving space for 6 spaces)
4) Divide the number of spaces required by (number of words - 1)
5) Write those words, writing the same number of spaces each time.
Should give the desired effect I hope.
You obviously have some idea how to solve this, as you have already produced the sample output.
Perhaps re-solve your original problem writing down in words what you do in each step....
e.g.
Print text asking for sentence.
Take input
Split input into words.
Print text asking for width.
...
If you are stuck at any level, then expand the details into sub-steps.
I would look to separate the problem of working out a sequence of words which will fit onto a line.
Then how many spaces to add between each of the words.
Below is an example for printing one line after you find how many words to print and what is the starting word of the line.
std::cout << "|";
numOfSpaces = lineWidth - numOfCharsUsedByWords;
/*
* If we have three words |word1 word2 word3| in a line
* ideally the spaces to print between then are 1 less than the words
*/
int spaceChunks = numOfWordsInLine - 1;
/*
* Print the words from starting point to num of words
* you can print in a line
*/
for (j = 0; j < numOfWordsInLine; ++j) {
/*
* Calculation for the number of spaces to print
* after every word
*/
int spacesToPrint = 0;
if (spaceChunks <= 1) {
/*
* if one/two words then one
* chunk of spaces between so fill then up
*/
spacesToPrint = numOfSpaces;
} else {
/*
* Here its just segmenting a number into chunks
* example: segment 7 into 3 parts, will become 3 + 2 + 2
* 7 to 3 = (7%3) + (7/3) = 1 + 2 = 3
* 4 to 2 = (4%2) + (4/2) = 0 + 2 = 2
* 2 to 1 = (2%1) + (2/1) = 0 + 2 = 2
*/
spacesToPrint = (numOfSpaces % spaceChunks) + (numOfSpaces / spaceChunks);
}
numOfSpaces -= spacesToPrint;
spaceChunks--;
cout << words[j + lineStartIdx];
for (int space = 0; space < spacesToPrint; space++) {
std::cout << " ";
}
}
std::cout << "|" << std::endl;
Hope this code helps. Also you need to consider what happens if you set width less then the max word size.

Sql, Compged, Min and blanks

I'm comparing 4 strings using compged in sql here is an extract:
MIN(compged(a.string1,b.string1),
compged(a.string1,b.string2),
compged(a.string2,b.string1),
compged(a.string2,b.string2)) < 200
Unfortunately there are times that a string from set a and a string from set b is blank/empty, this means compged resolves to 0 and the min found is 0. Is there a way to modify so that comparing two blank strings gives a value greater than 200 or something?
Thanks in advance
You can calculate new variables to handle that situation (both compared variables are blank) and use them inside the MIN() function:
case
when (missing(a.string1) and missing(b.string1)) then 300
else compged(a.string1,b.string1)
end as compged_11,
/* do the same for combinations 12, 21 and 22 */
MIN(calculated compged_11,
calculated compged_12,
calculated compged_21,
calculated compged_22) < 200
The quick and dirty option is to wrap each string with a different 200char string in case the string is null or the length is 0 (as empty strings aren't always referenced as NULL)
So a.string1 = 200*'Z', b.string1 = 200*'X'.....
Or better even, to wrap each call with checks so if a.string1 is null or is empty, then return the length of the other string. And if both are empty, then return 1000 so the record is removed by the where clause.
You can also add a prefix - 'A' to all strings. This will ensure tht there are no empty strings, and will not change the distance. But you still need to weed out cases where both strings are empty.

Creating a histogram with C++ (Homework)

In my c++ class, we got assigned pairs. Normally I can come up with an effective algorithm quite easily, this time I cannot figure out how to do this to save my life.
What I am looking for is someone to explain an algorithm (or just give me tips on what would work) in order to get this done. I'm still at the planning stage and want to get this code done on my own in order to learn. I just need a little help to get there.
We have to create histograms based on a 4 or 5 integer input. It is supposed to look something like this:
Calling histo(5, 4, 6, 2) should produce output that appears like:
*
* *
* * *
* * *
* * * *
* * * *
-------
A B C D
The formatting to this is just killing me. What makes it worse is that we cannot use any type of arrays or "advanced" sorting systems using other libraries.
At first I thought I could arrange the values from highest to lowest order. But then I realized I did not know how to do this without using the sort function and I was not sure how to go on from there.
Kudos for anyone who could help me get started on this assignment. :)
Try something along the lines of this:
Determine the largest number in the histogram
Using a loop like this to construct the histogram:
for(int i = largest; i >= 1; i--)
Inside the body of the loop, do steps 3 to 5 inclusive
If i <= value_of_column_a then print a *, otherwise print a space
Repeat step 3 for each column (or write a loop...)
Print a newline character
Print the horizontal line using -
Print the column labels
Maybe i'm mistaken on your q, but if you know how many items are in each column, it should be pretty easy to print them like your example:
Step 1: Find the Max of the numbers, store in variable, assign to column.
Step 2: Print spaces until you get to column with the max. Print star. Print remaining stars / spaces. Add a \n character.
Step 3: Find next max. Print stars in columns where the max is >= the max, otherwise print a space. Add newline. at end.
Step 4: Repeat step 3 (until stop condition below)
when you've printed the # of stars equal to the largest max, you've printed all of them.
Step 5: add the -------- line, and a \n
Step 6: add row headers and a \n
If I understood the problem correctly I think the problem can be solved like this:
a= <array of the numbers entered>
T=<number of numbers entered> = length(a) //This variable is used to
//determine if we have finished
//and it will change its value
Alph={A,B,C,D,E,F,G,..., Z} //A constant array containing the alphabet
//We will use it to print the bottom row
for (i=1 to T) {print Alph[i]+" "}; //Prints the letters (plus space),
//one for each number entered
for (i=1 to T) {print "--"}; //Prints the two dashes per letter above
//the letters, one for each
while (T!=0) do {
for (i=1 to N) do {
if (a[i]>0) {print "*"; a[i]--;} else {print " "; T--;};
};
if (T!=0) {T=N};
}
What this does is, for each non-zero entered number, it will print a * and then decrease the number entered. When one of the numbers becomes zero it stops putting *s for its column. When all numbers have become zero (notice that this will occur when the value of T comes out of the for as zero. This is what the variable T is for) then it stops.
I think the problem wasn't really about histograms. Notice it also doesn't require sorting or even knowing the

Finding all possible common substrings from a file consisting of strings using c++

I am trying to find all possible common strings from a file consisting of strings of various lengths. Can anybody help me out?
E.g input file is sorted:
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAC
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAG
AAAAAAAATTAGGCTGGG
AAAAAAAATTGAAACATCTATAGGTC
AAAAAAACTCTACCTCTCT
AAAAAAACTCTACCTCTCTATACTAATCTCCCTACA
and my desired output is:
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAC
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAG
AAAAAAAATTAGGCTGGG
AAAAAAAATTGAAACATCTATAGGTC
AAAAAAACTCTACCTCTCTATACTAATCTCCCTACA
[EDIT] Each line which is a substring of any other line should be removed.
Basically for each line, compare it with the next line to see if the next line is shorter or if the next line's substring is not equal to the current line. If this is true, the line is unique. This can be done with a single linear pass because the list is sorted: any entry which contains a substring of the entry will follow that entry.
A non-algorithmic optimization (micro-optimization) is to avoid the use of substr which creates a new string. We can simply compare the other string as though it was truncated without actually creating a truncated string.
vector<string> unique_lines;
for (unsigned int j=0; j < lines.size() - 2; ++j)
{
const string& line = lines[j];
const string& next_line = lines[j + 1];
// If the line is not a substring of the next line,
// add it to the list of unique lines.
if (line.size() >= next_line.size() ||
line != next_line.substr(0, line .size()))
unique_lines.push_back(line);
}
// The last line is guaranteed to not be a substring of any
// previous line as the lines are sorted.
unique_lines.push_back(lines.back());
// The desired output will be contained in 'unique_lines'.
What I understand is you want to find substring and wanted to remove such string which is substring of any string.
For that you can use strstr method to find if a string is a substring of another string.
Hope this will help..
Well, that's probably not the fastest solution to solve your problem, but seems easy to implement. You just keep a histogram of chars that will represent a signature of a string. For each string that you read (separated for spaces), you count the numbers of each char and just stores it on your answer if there isn't any other string with the same numbers of each char. Let me illustrate it:
aaa bbb aabb ab aaa
Here we have just two possible input letters, so, we just need an histogram of size 2.
aaa - hist[0] = 3, hist[1] = 0 : New one - add to the answer
bbb - hist[0] = 0, hist[1] = 3 : New one - add to the answer
aabb - hist[0] = 2, hist[1] = 2 : New one - add to the answer
ab - hist[0] = 1, hist[1] = 1 : New one - add to the answer
aaa - hist[0] = 3, hist[1] = 0 : Already exists! Don't add to the answer.
The bottleneck of your implementation will be the histogram comparisons, and there are a lot of possible implementations for it.
The simplest one would be a simple linear search, iterating through all your previous answer and comparing with the current histogram, wich would be O(1) to store and O(n) to search. If you have a big file, it would take hours to finish.
A faster one, but a lot more troublesome to implement, would use a hash table to store your answer, and use the histogram signature to generate the hash code. Would be to troublesome to explain this approach here.