Let's say that we have a string declared...
string paragraphy = "This is a really really long string containing a paragraph long content. I want to wrap this text by using a for loop to do so.";
With this string variable I want to wrap the text if it is more than 60 width and if there is a space after those the 60 width.
Can someone please provide me with the code or any help in creating something like this.
A basic idea to solving this is to keep track of the last space in a segment of the string before the 60th character in that segment.
Since this is homework, I'll let you come up with the code, however here's some rough pseudo-code of the above suggestion:
- current_position = start of the string
- WHILE current_position NOT past the end of the string
- LOOP 1...60 from the current_position (also don't go past the end of the string)
- IF the character at current_position is a space, set the space_position to this position
- Replace the character (the space) at the space_position with a newline
- Set the current_position to the next character after the space_position
- If you're printing the string rather than inserting newline characters into it, you would print any remaining part of the string here.
You might also want to consider the case where you don't have any spaces in a block of 60 characters.
Related
I have a string, and I want to extract, using regular expressions, groups of characters that are between the character : and the other character /.
typically, here is a string example I'm getting:
'abcd:45.72643,4.91203/Rou:hereanotherdata/defgh'
and so, I want to retrieved, 45.72643,4.91203 and also hereanotherdata
As they are both between characters : and /.
I tried with this syntax in a easier string where there is only 1 time the pattern,
[tt]=regexp(str,':(\w.*)/','match')
tt = ':45.72643,4.91203/'
but it works only if the pattern happens once. If I use it in string containing multiples times the pattern, I get all the string between the first : and the last /.
How can I mention that the pattern will occur multiple time, and how can I retrieve it?
Use lookaround and a lazy quantifier:
regexp(str, '(?<=:).+?(?=/)', 'match')
Example (Matlab R2016b):
>> str = 'abcd:45.72643,4.91203/Rou:hereanotherdata/defgh';
>> result = regexp(str, '(?<=:).+?(?=/)', 'match')
result =
1×2 cell array
'45.72643,4.91203' 'hereanotherdata'
In most languages this is hard to do with a single regexp. Ultimately you'll only ever get back the one string, and you want to get back multiple strings.
I've never used Matlab, so it may be possible in that language, but based on other languages, this is how I'd approach it...
I can't give you the exact code, but a search indicates that in Matlab there is a function called strsplit, example...
C = strsplit(data,':')
That should will break your original string up into an array of strings, using the ":" as the break point. You can then ignore the first array index (as it contains text before a ":"), loop the rest of the array and regexp to extract everything that comes before a "/".
So for instance...
'abcd:45.72643,4.91203/Rou:hereanotherdata/defgh'
Breaks down into an array with parts...
1 - 'abcd'
2 - '45.72643,4.91203/Rou'
3 - 'hereanotherdata/defgh'
Then Ignore 1, and extract everything before the "/" in 2 and 3.
As John Mawer and Adriaan mentioned, strsplit is a good place to start with. You can use it for both ':' and '/', but then you will not be able to determine where each of them started. If you do it with strsplit twice, you can know where the ':' starts :
A='abcd:45.72643,4.91203/Rou:hereanotherdata/defgh';
B=cellfun(#(x) strsplit(x,'/'),strsplit(A,':'),'uniformoutput',0);
Now B has cells that start with ':', and has two cells in each cell that contain '/' also. You can extract it with checking where B has more than one cell, and take the first of each of them:
C=cellfun(#(x) x{1},B(cellfun('length',B)>1),'uniformoutput',0)
C =
1×2 cell array
'45.72643,4.91203' 'hereanotherdata'
Starting in 16b you can use extractBetween:
>> str = 'abcd:45.72643,4.91203/Rou:hereanotherdata/defgh';
>> result = extractBetween(str,':','/')
result =
2×1 cell array
{'45.72643,4.91203'}
{'hereanotherdata' }
If all your text elements have the same number of delimiters this can be vectorized too.
So I have some code which reads from a file and separates by the commas in the file. However some things in the file often have spaces after or before the commas so it's causing a bit of a problem when executing the code.
This is the code that I which reads in the data from the file. Using the same kind of format I was wondering if there was a way to prepare for this spaces
while(getline(inFile, line)){
stringstream linestream(line);
// each field of the inventory file
string type;
string code;
string countRaw;
int count;
string priceRaw;
int price;
string other;
//
if(getline(linestream,type,',') && getline(linestream,code,',')
&& getline(linestream,countRaw,',')&& getline(linestream,priceRaw,',')){
// optional other
getline(linestream,other,',');
count = atoi(countRaw.c_str());
price = atoi(priceRaw.c_str());
StockItem *t = factoryFunction(code, count, price, other, type);
list.tailAppend(t);
}
}
The better approach for those kind of problems is a state machine. Each character that you get should act in a simple way. You don't state if you need spaces between words non delimited by commas, so I suppose you need them. I don't know what you need to do with double spaces, I suppose you need to keep things as are. So start reading one character at a time, there are two variables the start positions and the limit position. When you start you are determining the start position ( state 1 ). If you find any character different than the space character you set that start position to that character and you change your state to ( state 2 ). When in state 2 when you find a non space character you set the limit position to the next position than the character you found. If you find a comma character you get the string that begins form start to limit and you change again into state 1.
Let's say there is a certain way of encrypting strings:
Append the character $, which is the first character in the alphabet, at the end of the string.
Form all the strings we get by continuously moving the first character to the end of the string.
Sort all the strings we have gotten into alphabetical order.
Form a new string by appending last character of each string to it.
For example, the word FRUIT is encrypted in the following manner:
We append the character $ at the end of the word:
FRUIT$
We then form all the strings by moving the first character at the end:
FRUIT$
RUIT$S
UIT$FR
IT$FRU
T$FRUI
$FRUIT
Then we sort the new strings into alphabetical order:
$FRUIT
FRUIT$
IT$FRU
RUIT$F
T$FRUI
UIT$FR
The encrypted string:
T$UFIR
Now my problem is obvious: How to decrypt a given string into it's original form.
I've been pounding my head for half a week now and I've finally run out of paper.
How should I get on with this?
What I have discovered:
if we have the last step of the encryption:
$FRUIT
FRUIT$
IT$FRU
RUIT$F
T$FRUI
UIT$FR
We can know the first and last character of the original string, since the rightmost column is the encrypted string itself, and the leftmost column is always in alphabetical order. The last character is the first character of the encrypted string, because $ is always first in the alphabet, and it only exists once in a string. Then, if we find the $ character from the rightmost column, and look up the character on the same row in the leftmost column, we get the first character.
So what we can know about the encrypted string T$UFIR is that the original string is F***T$, where * is an unknown character.
There ends my ideas. Now I have to utilize the world-wide-web and ask another human being: How?
You could say this is homework, and being familiar with my tutor, I place my bets on this being a dynamic programming -problem.
This is the Burrows-Wheeler transform.
It's an algorithm typically used for aiding compression algorithms, as it tends to group together common repeating phrases, and is reversible.
To decode your string:
Number each character:
T$UFIR
012345
Now sort, retaining the numbering. If characters repeat, you use the indices as a secondary sort-key, such that the indices for the repeated characters are kept in increasing order, or otherwise use a sorting algorithm that guarantees this.
$FIRTU
134502
Now we can decode. Start at the '$', and use the associated index as the next character to output ('$' = 1, so the next char is 'F'. 'F' is 3, so the next char is 'R', etc...)
The result:
$FRUIT
So just remove the marker character, and you're done.
I have a huge string (22000+ characters) of encoded text. The code is consisted of digits [0-9] and lower case letters [a-z]. I need a regular expression to insert a space after every 4 characters, and one to insert a line break [\n] after every fourty characters. Any ideas?
Many people would prefer to do this with a for loop and string concatenation, but I hate those substring calls. I am really against using regexes when they aren't the right tool for the job (parsing HTML), but I think it'd pretty easy to work with in this case.
JSFiddle Example
Let's say you have the string
var str = "aaaabbbbccccddddeeeeffffgggghhhhiiiijjjjkkkkllllmmmmnnnnoooo";
And you want to insert a space after every four characters, and a newline after 40 characters, you could use the following code
str.replace(/.{4}g/, function (value, index){
return value + (index % 40 == 36? '\n' : ' ');
});
Note that this wouldn't work if the newline(40) index wasn't a multiple of the space index(4)
I abstracted this in a project, here's a simple way to do it
/**
* Adds padding and newlines into a string without whitespace
* #param {str} str The str to be modified (any whitespace will be stripped)
* #param {int} spaceEvery number of characters before inserting a space
* #param {int} wrapeEvery number of spaces before using a newline instead
* return {string} The replaced string
*/
function addPadding(str, spaceEvery, wrapEvery) {
var regex = new RegExp(".{"+spaceEvery+"}", "g");
// Add space every {spaceEvery} chars, newline after {wrapEvery} spaces
return str.replace(/[\n\s]/g, '').replace(regex, function(value, index) {
// The index is the group that just finished
var newlineIndex = spaceEvery * (wrapEvery - 1);
return value + ((index % (spaceEvery * wrapEvery) === newlineIndex) ? '\n' : ' ');
});
}
Well, a regexp in itself doesn't insert a space, so I'll assume you have some command in whatever language you're using that inserts based on finding a regexp.
So, finding 4 characters and finding 40 characters: that's not pretty in general regular expressions (unless your particular implementation has nice ways to express numbers). For finding 4 characters, use
....
Because typical regexp finders use maximal munch, then from the end of one regexp, search forward and maximally munch again, that'll chunk your string into 4 character pieces. The ugly part is that in standard regular expressions, you'll have to use
........................................
to find chuncks of 40 characters, although I'll note that if you run your 4 character one first, you'll have to run
..................................................
or
.... .... .... .... .... .... .... .... .... ....
to account for the spaces you've already put in.
The period finds any characters, but given that you're only using [0-9|a-z], you could use that regexp in place of each period if you need to ensure nothing else slipped in, I was just avoiding making it even more gross.
As you may be noting, regexp have some limitations. Take a look at the Chomsky hierarchy to really get into their theoretical limitations.
Say I have a CString object strMain="AAAABBCCCCCCDDBBCCCCCCDDDAA";
I also have two smaller strings, say strSmall1="BB";
strSmall2="DD";
Now, I want to replace all occurence of strings which occur between strSmall1("BB") and strSmall2("DD") in strMain, with say "KKKKKKK"
Is there a way to do it without Regex. I cannot use regex as adding another file to the project is prohibited.
Is there a way in VC++/MFC to do it? Or any easy algorithm you can point me to?
int length = strMain.GetLength();
int begin = strMain.Find(strSmall1, 0) + strSmall1.GetLength();
int end = strMain.Find(strSmall2, 0);
CStringT left = strMain.Left(begin);
CStringT right = strMain.Right(length - end);
strMain = left + "KKKKKKK" + right
The easiest way is probably to handle the replacement recursively. Search for the starting delimiter and the ending delimiter. If you find them, put together a new string consisting of the string up to the starting delimiter, followed by the replacement string, followed by the return from recursively doing the replacement in the remainder of the string following the ending delimiter.
That, of course, assumes you want to replace all the occurrences in the main string -- if you only want to replace the first one, John Weldon's solution (for one example) will work quite nicely.
psudocode:
loop over string
if curlocation matches string strsmall1 save index break
loop over remaining string
replace till curlocation matches string strsmall2
Extra credit:
What will the next assignment be?
My answer:
Speed it up by jumping the length of strsmall1 and strsmall2 in loop iterations