yahoo pipes trimming all item titles - regex

After a lot of hard work, I have created two yahoo Pipes I will be using.
One of them has a minor problem however... I am trimming the title length down to leave enough room for a ... and a link to fit within a tweet.
It trims the first post correctly... however it trims all of the posts after that to 0 length (before adding a bit of extra text to the end).
The problem is I'm not using a loop for all items after a certain point, but the reason for that is the output is always items from a loop, and I need the output to be number at a certain point so that I can feed in that number asa variable to trim the length by. The pipe can be found here: http://pipes.yahoo.com/pipes/pipe.info?_id=3e6c3c6b2d23d8ce0cf66cb3efc5fb56
Typically, I am inserting any RSS feed in the top box, something like "new blog post:" in the middle and "#bussiness #hashtags" in the last box.
If you can see any way I can have this yahoo pipe work for all posts rather than just the top one, please let me know. its not a big deal as im only ever posting for the moment, the top post to twitter... however there may come a point where I need all of them looking the same.

Related

Find All String Occurrences, Except The Last One Found, and Remove Them

I am using Google Docs to open Walmart receipts that I email to myself. The Walmart store that I use 99.9% of the time seems to have made some firmware update to the Ingenico POS terminal that makes it display a running SUBTOTAL after each item is identified by the scanner. Here are some images to support my question..
The POS terminal looks like this:
Second image is the is the electronic receipt which I email myself from their IOS app. It is presumably taken from the POS terminal because it has the extra running SUBTOTAL lines after each item like the POS terminal screen shows. It has been doing this for a few months and I've been given no reason to believe, by management, that it will be corrected any time soon.
The final image is my actual paper receipt. This is printed from the register, its the one that you walk out with it and show the greeter/exit person to check your buggy and the items you've purchased.
Note that it does not show the extra SUBTOTAL.
I open the electronic receipt in a Google Document and their automatic OCR spits out the text of the receipt. It does a pretty darn good job, I'd say its 95%+ accurate with these receipts. I apply a very crude little regex that reformats these electronic receipts so that I can enter them into a database and use that data for my family's budgeting, taxes, and so forth. That has been working very well for me, albeit I would like to further automate that process but thats for a different question some day perhaps.
Right now, that little crude regex no longer formats the receipt into something usable for me.
What I would like to do is to remove the extra SUBTOTALS from the (broken) electronic receipt but leave the last SUBTOTAL alone. I highlighted the last SUBTOTAL on the receipt, which is always there, and should remain.
I have seen two other questions that are similar but I could not apply them to my situation. One of them was:
Remove all occurrences except the last one
What have I tried?
The following regex works in the online tester at regex101.com:
\nSUBTOTAL\t\d{1,3}(?:[.,]\d{3})*(?:[.,]\d{2})
It took me a while to come up with that regex from searching around but essentially I want it to find all of the SUBTOTAL literals with a preceding new-line and any decimal number amount from 0.01 to 999.99) and I just want to replace what that finds with a new-line and then I can allow my other regex creation to work on that like it used to before the firmware update to the POS terminal.
The regex correctly identifies every SUBTOTAL (including the last one) on the regex101.com site. I can apply a substitution of "\n" and I am back to seeing the receipt data I can work with but there were two issues:
1) I cant replicate this using Google Apps Script.
Here is my example:
function myFunction() {
var body = DocumentApp.getActiveDocument().getBody();
var newText = body.getText()
.match('\nSUBTOTAL\t\d{1,3}(?:[.,]\d{3})*(?:[.,]\d{2})')[1]
.replace(/%/mgi, "%\n");
body.clear();
body.setText(newText);
}
2) If I were to get the above code to work, I still have the issue of wanting to leave the last SUBTOTAL intact.
Here is a Google Doc that I have set up to experiment with:
https://docs.google.com/document/d/11bOJp2rmWJkvPG1FCAGsQ_n7MqTmsEdhDQtDXDY-52s/edit?usp=sharing
I use this regular expresion.
// JavaScript Syntax
'/\nSUBTOTAL\s\d{1,3}\.\d{2}| SUBTOTAL\n\d{1,3}\.\d{2}/g'
Also I make a script for google docs. You can use this Google Doc and see the results.
function deleting_subs() {
var body = DocumentApp.getActiveDocument().getBody();
var newText = body.getText();
var out = newText.replace(/\nSUBTOTAL\s\d{1,3}\.\d{2}|` SUBTOTAL\n\d{1,3}\.\d{2}/g, '');
// This is need to become more readable the resulting text.
out = out.replace(/R /g, 'R\n');
body.clear();
body.setText(out);
}
To execute the script, open the google doc file and click on:
Add ons.
Del_subs -> Deleting Subs.
Tip: After execute the complement/add on (Deleting Subs), undo the document edition, in that way other users can return to previous version of the text.
Hope this help to you.

Merging cells in openoffice deleting specific whitespace

Ok, so what I have is going to be 3 cells of data that I need to merge into a link to pictures in my store. What I am looking for is an easy way to do this without double clicking and cntl v pasting for 4x at 100+ lines per page...
Cell 1. Cell 2. Cell 3
Assets/ name. .jpg
Needs to be.... assets/name.jpg
This seems simple, but the problem is most of the names are 2 words and even the single word names when merged look like this...... assets/ name name .jpg
Giving me a space after/ and a space after the second name. If the "name" I am merging with has 2 or more parts I still need to keep those spaces intact or the link will not work the way it's set up currently. I may need to rename the pictures into 1 solid word just for linking purposes, but hoping to avoid an extra step.
Is there a way to merge and remove the spaces I need gone to create the link? I have done a couple pages the hard way, not fun when I have 200+ pages to do.
Any help is appreciated.
Thank you.
Jerry
It seems to me possible that an answer to a completely different Q may be of interest to you:
=TRIM(LOWER((A1))&TRIM(A2)&TRIM(A3))

c++ read from a file and output sorted content

I am learning to code and found interesting task, but i do not know where to start in solving it. So i have a file with some titles and comments which need to be placed under the right title. So the first line of the input contains a number N which determines the quantity of the titles. Each row starts with a unique article id (integer), followed by the title in quotation marks. After there is no more titles, comments are given. At the beginning there is Title ID and comment (one word), but comments may recur for the same ID. so here is a structure of a file:
<N>
<ID1> "<Title1>"
...
<IDN> "<TitleN>"
<ID1> <Comment1>
...
<IDK> <CommentK>
Now in the output file each Title has two lines - first for the title and second one for comments. Titles must be in ascending order. And comments should be in reverse order (newest comments in the beginning) Structure of output file:
<Title1>
<Comment11> ... < CommentK1>
...
<TitleN>
<Comment1N> ... < CommentLN>
Example:
input:
3
1 "This is some title"
3 "Another title"
2 "And one more"
1 COmment
1 Another
3 Great
2 Awesome
3 Lucky
2 Stanley
output
This is some title
Another COmment
And one more
Stanley Awesome
Another Title
Lucky Great
I do not now where to begin with.. Should I use arrays to save the data in memory and then try to sort it in the right pattern?Or is it better to load the text file into a data structure; in this case a linked list? Maybe someone can guide me in the right direction how to accomplish this task. (I do not ask to code it for me, just guide me or give some algorithm, it would be highly appreciated). Thanks!
I assume you know how to read a file in C++, if not, please look at it, for example on this tutorial.
For the sorting part, you could use a STL container
to store the ids. I would recommend a map with the id as key and the string as value.
The advantage of the map, is that it's already sorted (ascending order).
If you use another container, you should look at the sorting algorithms if you want to understand how they work. For instance bubble sort, selection sort, quick sort or merge sort for the main ones.
However if you want to sort without any implementation, have a look at this.
This doesn't provide you a specific answer for your problem, but it can help you start.
[UPDATE]
I didn't read correctly and I haven't seen that multiple lines could have the same ID. A map would not necessarily be the most suitable container.

Comparing two documents

I have two very large lists. They both were originally in excel, but the larger one is a list of emails (about 160,000) of them with other information like their name and address etc. And the smaller one is a list of just 18,000 emails.
My question is what would be the easiest way to get rid of all 18,000 rows from the first document that contain the email addresses from the second?
I was thinking regex or maybe there is another application I can use? I have tried searching online but it seems like there isn't much specific to this. I also tried notepad++ but it freezes when I try to compare these large files.
-Thank You in Advance!!
Good question. One way I would tackle this is making a C++ program [you could extrapolate the idea to the language of your choice; You never mentioned which languages you were proficient in] that read each item of the smaller file into a vector of strings. First, of course, use Excel to save the files as CSV instead of XLS or XLSX, which will comma-separate the values so you can work with them easier. For the larger list, "Save As" a copy of just email addresses, deleting the other rows for now.
Then, you could open the larger list and use a nested loop to check if you should output to an output file. Something like:
bool foundMatch=false;
for(int y=0;y<LargeListVector.size();y++) {
for(int x=0;x<SmallListVector.size();x++) {
if(SmallListVector[x]==LargeListVector[y]) foundMatch=true;
}
if(!foundMatch) OutputVector.append(LargeListVector[y]);
foundMatch=false;
}
That might be partially pseudo-code, but do you get the idea?
So I read a forum post at : Here
=MATCH(B1,$A$1:$A$3,0)>0
Column B would be the large list, with the 160,000 inputs and column A was my list of things I needed to delete of 18,000.
I used this to match everything, and in a separate column pasted this formula. It would print out either an error or TRUE. If the data was in both columns it printed out true.
Then because I suck with excel, I threw this text into Notepad++ and searched for all lines that contained TRUE (match case, because in my case some of the data had the word true in it without caps.) I marked those lines, then under search, bookmarks, I removed all lines with bookmarks. Pasted that back into excel and voila.
I would like to thank you guys for helping and pointing me in the right direction :)

How to iterate over all the page breaks in an Excel 2003 worksheet via COM

I've been trying to retrieve the locations of all the page breaks on a given Excel 2003 worksheet over COM. Here's an example of the kind of thing I'm trying to do:
Excel::HPageBreaksPtr pHPageBreaks = pSheet->GetHPageBreaks();
long count = pHPageBreaks->Count;
for (long i=0; i < count; ++i)
{
Excel::HPageBreakPtr pHPageBreak = pHPageBreaks->GetItem(i+1);
Excel::RangePtr pLocation = pHPageBreak->GetLocation();
printf("Page break at row %d\n", pLocation->Row);
pLocation.Release();
pHPageBreak.Release();
}
pHPageBreaks.Release();
I expect this to print out the row numbers of each of the horizontal page breaks in pSheet. The problem I'm having is that although count correctly indicates the number of page breaks in the worksheet, I can only ever seem to retrieve the first one. On the second run through the loop, calling pHPageBreaks->GetItem(i) throws an exception, with error number 0x8002000b, "invalid index".
Attempting to use pHPageBreaks->Get_NewEnum() to get an enumerator to iterate over the collection also fails with the same error, immediately on the call to Get_NewEnum().
I've looked around for a solution, and the closest thing I've found so far is http://support.microsoft.com/kb/210663/en-us. I have tried activating various cells beyond the page breaks, including the cells just beyond the range to be printed, as well as the lower-right cell (IV65536), but it didn't help.
If somebody can tell me how to get Excel to return the locations of all of the page breaks in a sheet, that would be awesome!
Thank you.
#Joel: Yes, I have tried displaying the user interface, and then setting ScreenUpdating to true - it produced the same results. Also, I have since tried combinations of setting pSheet->PrintArea to the entire worksheet and/or calling pSheet->ResetAllPageBreaks() before my call to get the HPageBreaks collection, which didn't help either.
#Joel: I've used pSheet->UsedRange to determine the row to scroll past, and Excel does scroll past all the horizontal breaks, but I'm still having the same issue when I try to access the second one. Unfortunately, switching to Excel 2007 did not help either.
Experimenting with Excel 2007 from Visual Basic, I discovered that the page break isn't known unless it has been displayed on the screen at least once.
The best workaround I could find was to page down, from the top of the sheet to the last row containing data. Then you can enumerate all the page breaks.
Here's the VBA code... let me know if you have any problem converting this to COM:
Range("A1").Select
numRows = Range("A1").End(xlDown).Row
While ActiveWindow.ScrollRow < numRows
ActiveWindow.LargeScroll Down:=1
Wend
For Each x In ActiveSheet.HPageBreaks
Debug.Print x.Location.Row
Next
This code made one simplifying assumption:
I used the .End(xlDown) method to figure out how far the data goes... this assumes that you have continuous data from A1 down to the bottom of the sheet. If you don't, you need to use some other method to figure out how far to keep scrolling.
Did you set ScreenUpdating to True, as mentioned in the KB article?
You may want to actually toggle it to True to force a screen repaint. It sounds like the calculation of page breaks is a side-effect of actually rendering the page, rather than something Excel does on demand, so you have to trigger a page rendering on the screen.