I have the following string:
1,1,1,0,1,1,2,1,1,1,1,2,1,1,1,0,1,1,0,1,1,1,
0-->rupture
2-->continuity
When i have 1s between two 0s it means that i have a document
[0,1,1,1,0] = D
when i have 1s between a 2 and a 0 it means that i have a fragment [2,1,...,1,0] = f and i add all the fragments to a list of fragments F and it signifies the end of the sub-fragments
when i have 1s between a 2 and a 2 it means that i also have a fragment [2,1,...,1,2] = f
As a solution I must have in the end :
3 documents D1,D2,D3 which are located between the indices [0,3],
[15,18], [18,21]
A Fragment F between [3,15] containing 3 sub-fragments , f1 between [3,6] ,f2 between [6,11] and f3 between [11,15].
Note: We consider that the string starts with a 0 and ends with a 0
This is why we have a document between [0,3] and another document between [18,21]
I am trying to formulate this problem but i can't come up with a solid idea. Please tell me if it is clear. and what can I use as an algorithm to help solve this problem, can I use a specific data-structure like a tree...
Thank you,
Hani.
If your string is:
1,1,1,0,1,1,2,1,1,1,1,2,1,1,1,0,1,1,0,1,1,1
Initialize lastPos = 0, lastType = 0 {lastType = 0 for 0 and 2 for 2}
Traverse the array. You find the next 0 at position 3. Since lastType was equal to 0, you know that you have found a sequence of 1s between 2 zeroes. Do whatever.
Make lastType = 0, lastPos = 3.
Continue till the end.
Order of time complexity: O(n)
Order of space complexity: O(1)
Related
I would like to extract integers from strings from a cell array in Matlab. Each string contains 1 or 2 integers formatted as shown below. Each number can be one or two digits. I would like to convert each string to a 1x2 array. If there is only one number in the string, the second column should be -1. If there are two numbers then the first entry should be the first number, and the second entry should be the second number.
'[1, 2]'
'[3]'
'[10, 3]'
'[1, 12]'
'[11, 12]'
Thank you very much!
I have tried a few different methods that did not work out. I think that I need to use regex and am having difficulty finding the proper expression.
You can use str2num to convert well formatted chars (which you appear to have) to the correct arrays/scalars. Then simply pad from the end+1 element to the 2nd element (note this is nothing in the case there's already two elements) with the value -1.
This is most clearly done in a small loop, see the comments for details:
% Set up the input
c = { ...
'[1, 2]'
'[3]'
'[10, 3]'
'[1, 12]'
'[11, 12]'
};
n = cell(size(c)); % Initialise output
for ii = 1:numel(n) % Loop over chars in 'c'
n{ii} = str2num(c{ii}); % convert char to numeric array
n{ii}(end+1:2) = -1; % Extend (if needed) to 2 elements = -1
end
% (Optional) Convert from a cell to an Nx2 array
n = cell2mat(n);
If you really wanted to use regex, you could replace the loop part with something similar:
n = regexp( c, '\d{1,2}', 'match' ); % Match between one and two digits
for ii = 1:numel(n)
n{ii} = str2double(n{ii}); % Convert cellstr of chars to arrays
n{ii}(end+1:2) = -1; % Pad to be at least 2 elements
end
But there are lots of ways to do this without touching regex, for example you could erase the square brackets, split on a comma, and pad with -1 according to whether or not there's a comma in each row. Wrap it all in a much harder to read (vs a loop) cellfun and ta-dah you get a one-liner:
n = cellfun( #(x) [str2double( strsplit( erase(x,{'[',']'}), ',' ) ), -1*ones(1,1-nnz(x==','))], c, 'uni', 0 );
I'd recommend one of the loops for ease of reading and debugging.
I wanted to find out the number of 0's at end of integer.
Eg for 2020 it should count 1
for 2000 it should count 3
for 3010000 it should count 4
I have no idea to do it without counting all the zeros and not just the ending ones!
someone please help :)
Go to Power Query Editor and add a Custom Colum with this below code-
if Number.Mod([number],100000) = 0 then 5
else if Number.Mod([number],10000) = 0 then 4
else if Number.Mod([number],1000) = 0 then 3
else if Number.Mod([number],100) = 0 then 2
else if Number.Mod([number],10) = 0 then 1
else 0
Considered highst possibility of trailing 0 is 5. You can add more if/else case following the above logic if you predict more numbers of consecutive 0 at the end.
Here is sample output using above logic-
Take advantage of the fact, that text "00123" converted to number will be 2 characters shorter.
= let
TxtRev = Text.Reverse(Number.ToText([num]))&"1", /*convert to text and reverse, add 1 to handle num being 0*/
TxtNoZeroes = Number.ToText(Number.FromText(TxtRev)) /*convert to number to remove starting zeroes and then back to text*/
in
Text.Length(TxtRev)-Text.Length(TxtNoZeroes) /*compare length of original value with length without zeroes*/
This will work for any number of trailing zeroes (up to Int64 capacity of course, minus space for &"1"). Assuming that the column is of number type; if it's a text then just remove Number.ToText in TxtRev. If you have negative numbers or decimals, replace characters not being a digit after converting to text. For initial number being 0 it shows 1, but if it should show 0 just remove &"1".
You can do it as general string manipulation:
= Text.Length(Text.From([number])) - Text.Length(Text.TrimEnd(Text.From(number]), "0"))
We convert the column to string, strip of the zeroes, count take that away from the total length, giving you the amount of stripped zeroes.
Edit: I messed up my first answer, this one should in fact be correct
The task is to justify text within a certain width.
user inputs: Hello my name is Harrry. This is a sample text input that nobody
will enter.
output: What text width do you want?
user inputs: 15
output: |Hello my name|
|is Harrry. This|
|is a sample|
|text that|
|nobody will|
|enter. |
Basically, the line has to be 15 spaces wide including blank spaces. Also, if the next word in the line cant fit into 15, it will skip entirely. If there are multiple words in a line, it will try to distribute the spaces evenly between each word. See the line that says "Is a sample" for example.
I created a vector using getline(...) and all that and the entire text is saved in a vector. However, I'm kind of stuck on moving forward. I tried using multiple for loops, but I just cant seem to skip lines or even out the spacing at all.
Again, not looking or expecting anyone to solve this, but I'd appreciate it if you could guide me into the right direction in terms of logic/algorithm i should think about.
You should consider this Dynamic programming solution.
Split text into “good” lines
Since we don't know where we need to break the line for good justification, we start guessing where the break to be done to the paragraph. (That is we guess to determine whether we should break between two words and make the second word as start of the next line).
You notice something? We brutefore!
And note that if we can't find a word small enought to fit in the remaining space in the current line, we insert spaces inbetween the words in the current line. So, the space in the current line depends on the words that might go into the next or previous line. That's Dependency!
You are bruteforcing and you have dependency,there comes the DP!
Now lets define a state to identify the position on our path to solve this problem.
State: [i : j] ,which denotes line of words from ith word to jth word in the original sequence of words given as input.
Now, that you have state for the problem let us try to define how these states are related.
Since all our sub-problem states are just a pile of words, we can't just compare the words in each state and determine which one is better. Here better delineates to the use of line's width to hold maximum character and minimum spaces between the words in the particular line. So, we define a parameter, that would measure the goodness of the list of words from ith to jth words to make a line. (recall our definition of subproblem state). This is basically evaluating each of our subproblem state.
A simple comparison factor would be :
Define badness(i, j) for line of words[i : j].
For example,
Infinity if total length > page width,
else (page width − total length of words in current line)3
To make things even simple consider only suffix of the given text and apply this algorithm. This would reduce the DP table size from N*N to N.
So, For finishing lets make it clear what we want in DP terms,
subproblem = min. badness for suffix words[i :]
=⇒ No.of subproblems = Θ(n) where n = no of words
guessing = where to end first line, say i : j
=⇒ no. of choices for j = n − i = O(n)
recurrence relation between the subproblem:
• DP[i] = min(badness (i, j) + DP[j] for j in range (i + 1, n + 1))
• DP[n] = 0
=⇒ time per subproblem = Θ(n)
so, total time = Θ(n^2).
Also, I'll leave it to you how insert spaces between words after determining the words in each line.
Logic would be:
1) Put words in array
2) Loop though array of words
3) Count the number of chars in each word, and check until they are the text width or less (skip if more than textwidth). Remember the number of words that make up the total before going over 15 (example remember it took 3 words to get 9 characters, leaving space for 6 spaces)
4) Divide the number of spaces required by (number of words - 1)
5) Write those words, writing the same number of spaces each time.
Should give the desired effect I hope.
You obviously have some idea how to solve this, as you have already produced the sample output.
Perhaps re-solve your original problem writing down in words what you do in each step....
e.g.
Print text asking for sentence.
Take input
Split input into words.
Print text asking for width.
...
If you are stuck at any level, then expand the details into sub-steps.
I would look to separate the problem of working out a sequence of words which will fit onto a line.
Then how many spaces to add between each of the words.
Below is an example for printing one line after you find how many words to print and what is the starting word of the line.
std::cout << "|";
numOfSpaces = lineWidth - numOfCharsUsedByWords;
/*
* If we have three words |word1 word2 word3| in a line
* ideally the spaces to print between then are 1 less than the words
*/
int spaceChunks = numOfWordsInLine - 1;
/*
* Print the words from starting point to num of words
* you can print in a line
*/
for (j = 0; j < numOfWordsInLine; ++j) {
/*
* Calculation for the number of spaces to print
* after every word
*/
int spacesToPrint = 0;
if (spaceChunks <= 1) {
/*
* if one/two words then one
* chunk of spaces between so fill then up
*/
spacesToPrint = numOfSpaces;
} else {
/*
* Here its just segmenting a number into chunks
* example: segment 7 into 3 parts, will become 3 + 2 + 2
* 7 to 3 = (7%3) + (7/3) = 1 + 2 = 3
* 4 to 2 = (4%2) + (4/2) = 0 + 2 = 2
* 2 to 1 = (2%1) + (2/1) = 0 + 2 = 2
*/
spacesToPrint = (numOfSpaces % spaceChunks) + (numOfSpaces / spaceChunks);
}
numOfSpaces -= spacesToPrint;
spaceChunks--;
cout << words[j + lineStartIdx];
for (int space = 0; space < spacesToPrint; space++) {
std::cout << " ";
}
}
std::cout << "|" << std::endl;
Hope this code helps. Also you need to consider what happens if you set width less then the max word size.
I noticed that the following two for-loop cases behave differently sometimes while most of the time they are the same. I couldn't figure out the pattern, does anyone have any idea? Thanks!
case 1:
for (i <- myList.length - 1 to 0 by -1) { ... }
case 2:
for (i <- myList.length - 1 to 0) { ...}
Well, they definitely don't do the same things. n to 0 by -1 means "start at n and go to 0, counting backwards by 1. So:
5 to 0 by -1
// res0: scala.collection.immutable.Range = Range(5, 4, 3, 2, 1, 0)
Whereas n to 0 means "start at n and got to 0 counting forward by 1". But you'll notice that if n > 0, then there will be nothing in that list, since there is no way to count forward to 0 from anything greater than zero.
5 to 0
// res1: scala.collection.immutable.Range.Inclusive = Range()
The only way that they would produce the same result is if n=0 since counting from 0 to 0 is the same forwards and backwards:
0 to 0 by -1 // Range(0)
0 to 0 // Range(0)
In your case, since you're starting at myList.length - 1, they will produce the same result when the length of myList is 1.
In summary, the first version makes sense, because you want to count down to 0 by counting backward (by -1). And the second version doesn't make sense because you're not going to want to count forward to 0 from a length (which is necessarily non-negative).
First, we need to learn more about how value members to and by works.
to - Click Here for API documentation
to is a value member that appears in classes like int, double etc.
scala> 1 to 3
res35: scala.collection.immutable.Range.Inclusive = Range(1, 2, 3)
Honestly, you don't have to use start to end by step and start.to(end, step) will also work if you are more comfortable working with in this world. Basically, to will return you a Range.Inclusive object if we are talking about integer inputs.
by - Click Here for API documentation
Create a new range with the start and end values of this range and a new step
scala> Range(1,8) by 3
res54: scala.collection.immutable.Range = Range(1, 4, 7)
scala> Range(1,8).by(3)
res55: scala.collection.immutable.Range = Range(1, 4, 7)
In the end, lets spend some time looking at what happens when the step is on a different direction from start to end. Like 1 to 3 by -1
Here is the source code of the Range class and it is actually pretty straightforward to read:
def by(step: Int): Range = copy(start, end, step)
So by is actually calling a function copy, so what is copy?
protected def copy(start: Int, end: Int, step: Int): Range = new Range(start, end, step)
So copy is literally recreate a new range with different step, then lets look at the constructor or Range itself.
Reading this paragraph of code
override final val isEmpty = (
(start > end && step > 0)
|| (start < end && step < 0)
|| (start == end && !isInclusive)
)
These cases will trigger the exception and your result will be a empty Range in cases like 1 to 3 by -1..etc.
Sorry the length of my post is getting out of control since I am also learning Scala now.
Why don't you just read the source code of Range, it is written by Martin Odersky and it is only 500 lines including comments :)
I am trying to find all possible common strings from a file consisting of strings of various lengths. Can anybody help me out?
E.g input file is sorted:
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAC
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAG
AAAAAAAATTAGGCTGGG
AAAAAAAATTGAAACATCTATAGGTC
AAAAAAACTCTACCTCTCT
AAAAAAACTCTACCTCTCTATACTAATCTCCCTACA
and my desired output is:
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAC
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAG
AAAAAAAATTAGGCTGGG
AAAAAAAATTGAAACATCTATAGGTC
AAAAAAACTCTACCTCTCTATACTAATCTCCCTACA
[EDIT] Each line which is a substring of any other line should be removed.
Basically for each line, compare it with the next line to see if the next line is shorter or if the next line's substring is not equal to the current line. If this is true, the line is unique. This can be done with a single linear pass because the list is sorted: any entry which contains a substring of the entry will follow that entry.
A non-algorithmic optimization (micro-optimization) is to avoid the use of substr which creates a new string. We can simply compare the other string as though it was truncated without actually creating a truncated string.
vector<string> unique_lines;
for (unsigned int j=0; j < lines.size() - 2; ++j)
{
const string& line = lines[j];
const string& next_line = lines[j + 1];
// If the line is not a substring of the next line,
// add it to the list of unique lines.
if (line.size() >= next_line.size() ||
line != next_line.substr(0, line .size()))
unique_lines.push_back(line);
}
// The last line is guaranteed to not be a substring of any
// previous line as the lines are sorted.
unique_lines.push_back(lines.back());
// The desired output will be contained in 'unique_lines'.
What I understand is you want to find substring and wanted to remove such string which is substring of any string.
For that you can use strstr method to find if a string is a substring of another string.
Hope this will help..
Well, that's probably not the fastest solution to solve your problem, but seems easy to implement. You just keep a histogram of chars that will represent a signature of a string. For each string that you read (separated for spaces), you count the numbers of each char and just stores it on your answer if there isn't any other string with the same numbers of each char. Let me illustrate it:
aaa bbb aabb ab aaa
Here we have just two possible input letters, so, we just need an histogram of size 2.
aaa - hist[0] = 3, hist[1] = 0 : New one - add to the answer
bbb - hist[0] = 0, hist[1] = 3 : New one - add to the answer
aabb - hist[0] = 2, hist[1] = 2 : New one - add to the answer
ab - hist[0] = 1, hist[1] = 1 : New one - add to the answer
aaa - hist[0] = 3, hist[1] = 0 : Already exists! Don't add to the answer.
The bottleneck of your implementation will be the histogram comparisons, and there are a lot of possible implementations for it.
The simplest one would be a simple linear search, iterating through all your previous answer and comparing with the current histogram, wich would be O(1) to store and O(n) to search. If you have a big file, it would take hours to finish.
A faster one, but a lot more troublesome to implement, would use a hash table to store your answer, and use the histogram signature to generate the hash code. Would be to troublesome to explain this approach here.