I am learning to code and found interesting task, but i do not know where to start in solving it. So i have a file with some titles and comments which need to be placed under the right title. So the first line of the input contains a number N which determines the quantity of the titles. Each row starts with a unique article id (integer), followed by the title in quotation marks. After there is no more titles, comments are given. At the beginning there is Title ID and comment (one word), but comments may recur for the same ID. so here is a structure of a file:
<N>
<ID1> "<Title1>"
...
<IDN> "<TitleN>"
<ID1> <Comment1>
...
<IDK> <CommentK>
Now in the output file each Title has two lines - first for the title and second one for comments. Titles must be in ascending order. And comments should be in reverse order (newest comments in the beginning) Structure of output file:
<Title1>
<Comment11> ... < CommentK1>
...
<TitleN>
<Comment1N> ... < CommentLN>
Example:
input:
3
1 "This is some title"
3 "Another title"
2 "And one more"
1 COmment
1 Another
3 Great
2 Awesome
3 Lucky
2 Stanley
output
This is some title
Another COmment
And one more
Stanley Awesome
Another Title
Lucky Great
I do not now where to begin with.. Should I use arrays to save the data in memory and then try to sort it in the right pattern?Or is it better to load the text file into a data structure; in this case a linked list? Maybe someone can guide me in the right direction how to accomplish this task. (I do not ask to code it for me, just guide me or give some algorithm, it would be highly appreciated). Thanks!
I assume you know how to read a file in C++, if not, please look at it, for example on this tutorial.
For the sorting part, you could use a STL container
to store the ids. I would recommend a map with the id as key and the string as value.
The advantage of the map, is that it's already sorted (ascending order).
If you use another container, you should look at the sorting algorithms if you want to understand how they work. For instance bubble sort, selection sort, quick sort or merge sort for the main ones.
However if you want to sort without any implementation, have a look at this.
This doesn't provide you a specific answer for your problem, but it can help you start.
[UPDATE]
I didn't read correctly and I haven't seen that multiple lines could have the same ID. A map would not necessarily be the most suitable container.
Related
I have a couple of rather large nested if functions in my spreadsheet. It sure would be nice to have an alternative method. Problem is I'm using a wildcard (*) in my lookup because the source text has slight variations (date for example).
For example, if my list of data contains:
VENMO PAYMENT 220828 1022093447487 BRENDA HOSPY
VENMO PAYMENT 220813 1031323447487 BRENDA HOSPY
I want these to show in an adjacent column of cells as just Venmo
Currently my if function in that second column of cells is:
=IF(COUNTIF($F10,"*APPLE.COM/BILL*"),"AP",
IF(COUNTIF($F10,"IIA VOYA*"),"VOYA",
IF(COUNTIF($F10,"VENMO PAYMENT*"),"Venmo",
IF(COUNTIF($F10,etc...
This works fine but quickly gets unruly as more things get added.
I've spent a great deal of time searching for functions and processes that would make this easier, or at least more compact, but I can't find a way with typical functions like vlookup or index/match.
If I've explained this in a comprehensible fashion perhaps you've seen or experienced a similar situation and could offer a suggestion. It would be appreciated!
I'm not opposed to using a programming function.
I've looked at, and for, various Excel functions or combinations with no luck on my own or online.
I have created a structure as below
Formula present in B2 is as below
=IFERROR(INDEX($F$2:$F$9,MIN(IF(COUNTIF(A2,"*"&$E$2:$E$9&"*")>0,ROW($E$2:$E$9),9999999)-1)),"---")
Enter it as an Array Formula using Ctrl+Shift+Enter
It will search all the strings present in column E in A2 when found will return all the row numbers of column E where there is a match, i have then used min to get the first one, and if not found it will return 9999999, and as the data is starting from row 2 i have added -1 to make it equal to the data index. after that i have called the index to search value present at that index in column F. and at the end used the if error function to show --- where no match was found and 999999 was returned.
I'm trying to design such an application that manipulates a list of thousands of individual words that is stored in a txt file for the following tasks,
1- Randomly picking up some words.
2- Checking whether some entered words by the user are actually in the list.
3- Retrieve the entire list from a txt file and store it temporarily for subsequent manipulations.
I'm not asking for implementation neither for pseudo codes. I'm looking for sufficient approach to deal with a massive list of words. For the time being, I might go with a vector of strings, however, searching thousands of words will take some times. Of course there must be some strategies to cope with this kind of tasks however, since my background is not Computer Science, I don't know in which direction which I go. Any suggestions are welcomed.
A vector of strings is fine for this problem. Just sort them, and then you can use binary search to find a string in the list.
Radix trees are a good solution for searching through word lists for matches. Reduced space for storage, but you'll have to have some custom code for getting and putting words in the list. And the text file won't necessarily be easy to read unless you create the tree anew each time you load from a text file. Here's an implementation I committed to on GitHub (I can't even remember the source material at this point) that might be of assistance to you.
So my professor assigned this project for us. It's pretty simple because it's our first hashing program. The program is to take 15 names as input and hash them and store them in something. I did it in a vector. Now once they are hashed the user enters another name and it will should hash that name and try to match it to one in the vector. Maybe it is me or maybe it is how he wrote the question but I'm a little confused. Our program is suppose to have collision protection. Which means we run an algorithm on the name inputted and it will spit out a number and that number is where we store the name in the vector. If there is another name already in that spot we are suppose to store it in the next available spot.
So lets say I enter the name jordan and jon. The algorithm will tell me to store these in the same spot (lets say spot 8) but collision protection will recognize that jordan is already taking up the spot (8) and it will move jon to the next available spot(lets say spot 9). Now when the user is entering names to see if it is in the vector already and he enters jon, the algorithm will see it should be in spot 8. Do you think I should just check to see if that spot is empty, if not say a match has been found in spot 8, even though the name in spot 8 is jordan and the name entered is jon. Or should I start at spot 8 and see if the strings match, and if they don't check the next spot and so on until i return to the original spot or find the match?
I wrote the program and it works fine, I just ran into this one dilemma and can't finish the program. Thanks
Your approach is Open Addressing collision handling.
According to this approach, when you are searching for an element, you should 'keep going' until you find the first empty spot - only then you are guaranteed the requested element is not stored.
Three things can happen when you look up a value.
You find the value you're looking for.
You find an empty spot in your hash table.
You find some other value in the assigned spot you're looking at.
In cases 1 and 2, it's clear what you should do. You know for sure that the value you're looking for is either in your table or it's not. In the third case, you should follow the same procedure that you do when adding items to the table. That is, keep looking in the next spot until you reach either condition 1 or 2 above.
After a lot of hard work, I have created two yahoo Pipes I will be using.
One of them has a minor problem however... I am trimming the title length down to leave enough room for a ... and a link to fit within a tweet.
It trims the first post correctly... however it trims all of the posts after that to 0 length (before adding a bit of extra text to the end).
The problem is I'm not using a loop for all items after a certain point, but the reason for that is the output is always items from a loop, and I need the output to be number at a certain point so that I can feed in that number asa variable to trim the length by. The pipe can be found here: http://pipes.yahoo.com/pipes/pipe.info?_id=3e6c3c6b2d23d8ce0cf66cb3efc5fb56
Typically, I am inserting any RSS feed in the top box, something like "new blog post:" in the middle and "#bussiness #hashtags" in the last box.
If you can see any way I can have this yahoo pipe work for all posts rather than just the top one, please let me know. its not a big deal as im only ever posting for the moment, the top post to twitter... however there may come a point where I need all of them looking the same.
I have two very large lists. They both were originally in excel, but the larger one is a list of emails (about 160,000) of them with other information like their name and address etc. And the smaller one is a list of just 18,000 emails.
My question is what would be the easiest way to get rid of all 18,000 rows from the first document that contain the email addresses from the second?
I was thinking regex or maybe there is another application I can use? I have tried searching online but it seems like there isn't much specific to this. I also tried notepad++ but it freezes when I try to compare these large files.
-Thank You in Advance!!
Good question. One way I would tackle this is making a C++ program [you could extrapolate the idea to the language of your choice; You never mentioned which languages you were proficient in] that read each item of the smaller file into a vector of strings. First, of course, use Excel to save the files as CSV instead of XLS or XLSX, which will comma-separate the values so you can work with them easier. For the larger list, "Save As" a copy of just email addresses, deleting the other rows for now.
Then, you could open the larger list and use a nested loop to check if you should output to an output file. Something like:
bool foundMatch=false;
for(int y=0;y<LargeListVector.size();y++) {
for(int x=0;x<SmallListVector.size();x++) {
if(SmallListVector[x]==LargeListVector[y]) foundMatch=true;
}
if(!foundMatch) OutputVector.append(LargeListVector[y]);
foundMatch=false;
}
That might be partially pseudo-code, but do you get the idea?
So I read a forum post at : Here
=MATCH(B1,$A$1:$A$3,0)>0
Column B would be the large list, with the 160,000 inputs and column A was my list of things I needed to delete of 18,000.
I used this to match everything, and in a separate column pasted this formula. It would print out either an error or TRUE. If the data was in both columns it printed out true.
Then because I suck with excel, I threw this text into Notepad++ and searched for all lines that contained TRUE (match case, because in my case some of the data had the word true in it without caps.) I marked those lines, then under search, bookmarks, I removed all lines with bookmarks. Pasted that back into excel and voila.
I would like to thank you guys for helping and pointing me in the right direction :)