C++ Parsing string to find occurrence

C++ Parsing string to find occurrence - c++

So I need to parse the input of the user in the following way:
If the user enters
C:\Program\Folder\NextFolder\File.txt
OR
C:\Program\Folder\NextFolder\File.txt\
Then I want to remove the file and just save
C:\Program\Folder\NextFolder\
I essentially want to find the first occurrence of \ starting at the end and if they put a trailing slash then I can find the second occurrence. I can decifer first or second with this code:
input.substr(input.size()-1,1)!="/"
But I don't understand how to find the first occurrence starting from the end. Any ideas?

This
input.substr(input.size()-1,1)!="/"
is very inefficient*. Use:
if( ! input.empty() && input[ input.length() - 1 ] == '/' )
{
// something
}
Finding the first occurrence of something, starting from the end is the same as finding the last "something", starting from the beginning. You may use find_last_of, or rfind Or, you may even use standard find, combined with rbegin and rend
*std::string::substr creates one substring, "/" probably creates another (depends on std::string::operator!=), compares the two strings and destroys the temp objects.
Note that
C:\Program\Folder\NextFolder\File.txt\
is not a path to a file, it's a directory.

If your input is of type std::string( that I think it is ) you can search it using string::find for normal search and string::rfind for reverse search( end to start ) and also to check last character you don't need and you shouldn't use substr, since it create a new instance of string just to check one character. You may just say if( input.back() == '/' )

If you are using C++ strings, then try the reverse iterator on the strings, to write your own logic on what is acceptable and what is not. There is a clear example in the link I provided.
From what I guessed, you are trying to store the directory name given a path which could be end with a file or a directory.
If that is the case, you are better of removing the trailing '\' and checking if it is a directory, and stop if it is, or else proceed if it is not.
Alternately, you can try splitting the string on '\' into two parts. Some related notes here.
If those are actual file names, (looks like you are using windows), so try the _splitpath function as well.

Related

Strtok and ability to ignore usernames?

So I'm currently doing a project where we need to parse sentences (more specifically tweets) by word and store the frequencies of words and the words themselves in a vector pair (with a custom find function to increment frequencies).
Im currently using strtok to parse the sentences and i was wondering if you could ingore any words that have a symbol # at the beginning of them. I currently have my delimiter for the strtok function as a bunch of non useful symbols and spaces !##&()–[{}]:;',?/*\".+\\^ and it ignores them correctly, but say I have a word: #thisismyusername, is there a way to ignore the whole word, including the 'thisismyusername' and not just the #?
I've been looking for documentation on something like this but haven't found anything yet.
Here is my strtok parsing code:
char* tempMap;
tempMap = strtok (tempHolderPos," !##&()–[{}]:;',?/*\".+\\^");
*tempHolderPos is the full sentence.
Thanks guys!

You can do exactly that. For instance, something like the following will work with your strtok loop:
someloop {
ptr = strtok (NULL, yourdelims);
if (*ptr == '#')
continue;`
...
}
After getting a token from strtok you simply check if the first character is a '#' and if so, go get the next word at this point -- effectively ignoring the word beginning with '#'.
Recall, when you dereference a character pointer, you get the character itself. When called on a char * variable name (the beginning address for the pointer), you get the 1st character. So you just dereference your pointer to your token and check if the first char is '#' and if so, go get the next word, skipping all additional processing that would be done on the token.

Looking at strtok reference, I think you can't do that directly. It would be easy though to ignore any token that starts with # and just continue without saving it.

mIRC Search for multiple words in text file

I am trying to search a text file that will return a result if more than one word is found in that line. I don't see this explained in the documentation and I have tried various loops with no success.
What I would like to do is something similar to this:
$read(name.txt, s, word1|word2|word3)
or even something like this:
$read(name.txt, w, word1*|*word2*|*word3)
I don't know RegEx that well so I'm assuming this can be done with that but I don't know how to do that.

The documentation in the client self is good but I also recommend this site: http://en.wikichip.org/wiki/mirc. And with your problem there is a nice article : http://en.wikichip.org/wiki/mirc/text_files
All the info is taken from there. So credits to wikichip.
alias testForString {
while ($read(file.txt, nw, *test*, $calc($readn + 1))) {
var %line = $v1
; you can add your own words in the regex, seperate them with a pipe (|)
noop $regex(%line,/(word1|word2|word3|test)/))
echo -a Amount of results: $regml(0)
}
}
$readn is an identifier that returns the line that $read() matched. It is used to start searching for the pattern on the next line. Which is in this case test.
In the code above, $readn starts at 0. We use $calc() to start at line 1. Every match $read() will start searching on the next line. When no more matches are after the line specified $read will return $null - terminating the loop.
The w switch is used to use a wildcard in your search
The n switch prevents evaluating the text it reads as if it was mSL code. In almost EVERY case you must use the n switch. Except if you really need it. Improper use of the $read() identifier without the 'n' switch could leave your script highly vulnerable.
The result is stored in a variable named %line to use it later in case you need it.
After that we use a noop to execute a regex to match your needs. In this case you can use $regml(0) to find the amount of matches which are specified in your regex search. Using an if-statement you can see if there are two or more matches.
Hope you find this helpful, if there's anything unclear, I will try to explain it better.
EDIT
#cp022
I can't comment, so I'll post my comment here, so how does that help in any way to read content from a text file?

POSIX or Linux API function to get file extension from path

I need a POSIX or Linux API function that takes file path and returns this file's extension. Every platform should have one, but I can't it for Linux. What's it called?

First use strrchr to find the last '.' in the pathname. If it doesn't exist, there's no "extension".
Next, use strchr to check whether there's any '/' after the last '.'. If so, the last '.' is in a directory component, not the filename, so there's no extension.
Otherwise, you found the extension. You can use the pointer to the position one past the '.' directly as a C string. No need to copy it to new storage unless the original string will be freed or clobbered before you use it.
Note: The above is assuming you define "extension" as only the final '.'-delimited component. If you want to consider things like .tar.gz and .cpp.bak as extensions, a slightly different approach works:
First, use strrchr to find the final '/'. If not found, treat the start of the string as your result.
Second, use strchr to find the first '.' starting from the position you just found. The result is your extension.

I don't think there's a default function for this.
In my filesystem library, I just apply string operations.
First, I get the filename with extension from the full path, looking for / separators and extracting everything after the last one. Then, I grab everything after the first . dot character, including the dot itself. It worked well so far.
Remember that some system files can start with a . dot character - so check if the filename begins with the dot character before extracting the extension.
Algorithm
Get file name from full path by removing folder names from the left:
/home/test/.myfile.cpp.bak ->
/test/.myfile.cpp.bak ->
/.myfile.cpp.bak ->
.myfile.cpp.bak
Check if the file name begins with .:
If it does, remove it from current file name .myfile.cpp.bak -> myfile.cpp.bak
Now, extract everything after the first . you encounter from the left (if you want multiple extensions) - otherwise, extract everything after the last . from the left
myfile.cpp.bak -> .cpp.bak (first case)
myfile.cpp.bak -> .bak (second case)

Including boost for filesystem is a bit too much. But as boost implementation reach TR2 and is implemented in visual studio it's maybe time to start looking at it.http://cpprocks.com/introduction-to-tr2-filesystem-library-in-vs2012/http://msdn.microsoft.com/en-us/library/hh874694.aspx

What seems to me the best way to solve this problem (in absence of API function, which itself is weird) is to combine Vittorio's and R.'s answers with basename function that takes a path and returns the file name, if the path points to a file: http://linux.die.net/man/3/basename
I also convert the resulting string to UTF-16 with mbstowcs and do all the finding with std::wstring:
std::wstring fileExtFromPath (const char * path)
{
const char * fileName = basename(filePath);
wchar_t buffer [MAX_PATH] = {0}; // Use mblen if you don't like MAX_PATH
const std::wstring fileNameW (buffer);
const size_t pointPosition = fileNameW.rfind(L".");
const std::wstring fileExtW = pointPosition == 0 ? std::wstring() : fileNameW.substr( + 1);
return fileExtW;
}

PowerShell isolating parts of strings

I have no experience with regular expressions and would love some help and suggestions on a possible solution to deleting parts of file names contained in a csv file.
Problem:
A list of exported file names contains a random unique identifier that I need isolated. The unique identifier has no predictable pattern, however the aspects which need removing do. Each file name ends with one of the following variations:
V, -V, or %20V followed by a random number sequence with possible spaces, additional "-","" and ending with .PDF
examples:
GTD-LVOE-43-0021 V10 0.PDF
GTD-LVOE-43-0021-V34-2.PDF
GTD-LVOE-43-0021_V02_9.PDF
GTD-LVOE-43-0021 V49.9.PDF
Solution:
My plan was to write a script to select of the first occurrence of a V from the end of the string and then delete it and everything to the right of it. Then the file names can be cleaned up by deleting any "-" or "_" and white space that occurs at the end of a string.
Question:
How can I do this with a regular expression and is my line of thinking even close to the right approach to solving this?

REGEX: [\s\-_]V.*?\.PDF
Might do the trick. You'd still need to replace away any leading - and _, but it should get you down the path, hopefully.
This would read as follows..
start with a whitespace, - OR _ followed by a V. Then take everything until you get to the first .PDF

checking float inside a string and return result?

I have a text file which I geline to a string. The file is like this: 0.2abc 0.2 .2abc .2 abc.2abc abc.2 abc0.20 .2 . 20
I wanna check the result then parse it in to separate float. The result is:0.2 0.2abc 2 20 2abc abc0.20 abc
This is expalined: check if there is 2 digit (before and after '.' (full stop)) whether with char or not. If only 1 site of the '.' is digit the '.' will be full stop.
How can I parse a STRING to separate result like that? I did use iterator to check the '.' and pos of it, but still got stuck.

The first thing you need to do is split the input in words. Easy, just don't use .getline()
but instead rely on `while (cin >> strWord ) { /* do stuff with word*/ };
The second thing is to kick out bad input words early: words of 2 characters or less, with more than one ., or with the . first or last.
You now know that the . is somewhere in the middle. find() will give you an iterator. ++ and -- give you the next and previous iterators. * gives you the character that the iterator points to. isdigit() tells you whether that character is a digit. Add ingredients together and you're done.

Seems like some fairly complicated advice above -- and not necessarily helpful.
Your question does not make it entirely clear what the end result should look like. Do you want an array of floating point numbers? Do you just want the sum? Do you want to print out the results?
If you want help with homework, the best policy is to post your own attempt and then others can help you improve it, to make it work.
One approach that might help is to try to break the string into sub-strings (tokens) and discard the junk.
Write a function that accepts a character and returns true (this is part of a floating point number) or false (it isn't).
Scan along the string using an iterator or an index.
While current char is not part of a token, skip it.
If you find a token char, while current char is part of a token, copy it to another string
etc. to get all floating point substrings.
Then you can use std::stringstream or ::atof() to convert.
Have a bit of a go and post what you can get done.

sounds like you could use some regex to extract your number.
Try this regex in order to extract the floating values within a string.
[0-9]+\.[0-9]+
Keep in mind that this won't extract integer values. ie 234abc
I don't know if there is a built-in way to use regex in c++ but i found this library with a quick google search which allows you to use regex in c++

Sounds like you should look at the "Interpreter" Design Pattern.
Or you could use the "State" Design Pattern and do it by hand.
There should be plenty of examples of both on the web.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

C++ Parsing string to find occurrence - c++

Related

Strtok and ability to ignore usernames?

mIRC Search for multiple words in text file

POSIX or Linux API function to get file extension from path

PowerShell isolating parts of strings

checking float inside a string and return result?

Categories

Resources