How to separate a line of input into multiple variables? - c++

I have a file that contains rows and columns of information like:
104857 Big Screen TV 567.95
573823 Blender 45.25
I need to parse this information into three separate items, a string containing the identification number on the left, a string containing the item name, and a double variable containing the price. The information is always found in the same columns, i.e. in the same order.
I am having trouble accomplishing this. Even when not reading from the file and just using a sample string, my attempt just outputs a jumbled mess:
string input = "104857 Big Screen TV 567.95";
string tempone = "";
string temptwo = input.substr(0,1);
tempone += temptwo;
for(int i=1 ; temptwo != " " && i < input.length() ; i++)
{
temptwo = input.substr(j,j);
tempone += temp2;
}
cout << tempone;
I've tried tweaking the above code for quite some time, but no luck, and I can't think of any other way to do it at the moment.

You can find the first space and the last space using std::find_first_of and std::find_last_of . You can use this to better split the string into 3 - first space comes after the first variable and the last space comes before the third variable, everything in between is the second variable.

How about following pseudocode:
string input = "104857 Big Screen TV 567.95";
string[] parsed_output = input.split(" "); // split input string with 'space' as delimiter
// parsed_output[0] = 104857
// parsed_output[1] = Big
// parsed_output[2] = Screen
// parsed_output[3] = TV
// parsed_output[4] = 567.95
int id = stringToInt(parsed_output[0]);
string product = concat(parsed_output[1], parsed_output[2], ... ,parsed_output[length-2]);
double price = stringToDouble(parsed_output[length-1]);
I hope, that's clear.

Well try breaking down the files components:
you know a number always comes first, and we also know a number has no white spaces.
The string following the number CAN have whitespaces, but won't contain any numbers(i would assume)
After this title, you're going to have more numbers(with no whitespaces)
from these components, you can deduce:
grabbing the first number is as simple as reading in using the filestream <<.
getting the string requires you to check until you reach a number, grabbing one character at a time and inserting that into a string. the last number is just like the first, using the filestream <<
This seems like homework so i'll let you put the rest together.

I would try a regular expression, something along these lines:
^([0-9]+)\s+(.+)\s+([0-9]+\.[0-9]+)$
I am not very good at regex syntax, but ([0-9]+) corresponds to a sequence of digits (this is the id), ([0-9]+\.[0-9]+) is the floating point number (price) and (.+) is the string that is separated from the two number by sequences of "space" characters: \s+.
The next step would be to check if you need this to work with prices like ".50" or "10".

Related

How to Get substring from given QString in Qt

I have a QString like this:
QString fileData = "SOFT_PACKAGES.ABC=MY_DISPLAY_OS:MY-Display-OS.2022-3.10.25.10086-1.myApplication"
What I need to do is to create substrings as follow:
SoftwareName = MY_DISPLAY_OS //text after ':'
Version = 10.25.10086-1
Release = 2022-3
I tried using QString QString::sliced(qsizetype pos, qsizetype n) const but didn't worked as I'm using 5.9 and this is supported on 6.0.
QString fileData = "SOFT_PACKAGES.ABC=MY_DISPLAY_OS:MY-Display-OS.2022-3.10.25.10086-1.myApplication";
QString SoftwareName = fileData.sliced(fileData.lastIndexOf(':'), fileData.indexOf('.'));
Please help me to code this in Qt.
Use QString::split 3 times:
Split by QLatin1Char('=') to two parts:
SOFT_PACKAGES.ABC
MY_DISPLAY_OS:MY-Display-OS.2022-3.10.25.10086-1.myApplication
Next, split 2nd part by QLatin1Char(':'), probably again to just 2 parts if there can never be more than 2 parts, so the 2nd part can contain colons:
MY_DISPLAY_OS
MY-Display-OS.2022-3.10.25.10086-1.myApplication
Finally, split 2nd part of previous step by QLatin1Char('.'):
MY-Display-OS
2022-3
10
25
10086-1
myApplication
Now just assemble your required output strings from these parts. If exact number of parts is unknown, you can get Version = 10.25.10086-1 by removing two first elements and last element from the final list above, and then joining the rest by QLatin1Char('.'). If indexes are known and fixed, you can just use QStringLiteral("%1.%2.%3").arg(....
One way is using
QString::mid(int startIndex, int howManyChar);
so you probably want something like this:
QString fileData = "SOFT_PACKAGES.ABC=MY_DISPLAY_OS:MY-Display-OS.2022-3.10.25.10086-1.myApplication";
QString SoftwareName = fileData.mid(fileData.indexOf('.')+1, (fileData.lastIndexOf(':') - fileData.indexOf('.')-1));
To extract the other part you requested and if the number of '.' characters remains constant along all strings you want to check you can use the second argument IndexOf to find shift the starting location to skip known many occurences of '.', so for example
int StartIndex = 0;
int firstIndex = fileData.indexOf('.');
for (int i=0; i<=6; i++) {
StartIndex += fileData.indexOf('.', firstIndex+StartIndex);
}
int EndIndex = fileData.indexOf('.', StartIndex+8);
should give the right indices to be cut out with
QString SoftwareVersion = fileData.mid(StartIndex, EndIndex - StartIndex);
If the strings to be parsed stay less consistent in this way, try switching to regular expressions, they are the more flexible approach.
In my experience, using regular expressions for these types of tasks is generally simpler and more robust. You can do this with a regular expressions with the following:
// Create the regular expression.
// Using C++ raw string literal to reduce use of escape characters.
QRegularExpression re(R"(.+=([\w_]+):[\w-]+\.(\d+-\d+)\.(\d+\.\d+\.\d+-?\d+))");
// Match against your string
auto match = re.match("SOFT_PACKAGES.ABC=MY_DISPLAY_OS:MY-Display-OS.2022-3.10.25.10086-1.myApplication");
// Now extract the portions you are interested in
// match.captured(0) is always the full string that matched the entire expression
const auto softwareName = match.captured(1);
const auto version = match.captured(3);
const auto release = match.captured(2);
Of course for this to make sense, you have to understand regex, so here is my explanation of the regex used here:
.+=([\w_]+):[\w-]+\.(\d+-\d+)\.(\d+\.\d+\.\d+-?\d+)
.+=
get all characters up to and including the first equals sign
([\w_]+)
capture one or more word characters (alphanumeric characters) or underscores
:
a colon
[\w-]+\.
one or more alphanumeric or dash characters followed by a single period
(\d+-\d+)
capture one or more of digits followed by a dash followed by one or more digits
\.
a single period
(\d+\.\d+\.\d+-?\d*)
capture three sets of digits with periods in between, then an optional dash, and any number of digits (could be zero digits)
I think it is generally easier to make a regex that handles changes to the input - lets say version becomes 10.25.10087 - more easily than manually parsing things by index.
Regex is a powerful tool once you get used to it, but it can certainly seem daunting at first.
Example of this regex on regex101.com: https://regex101.com/r/dj3Z4U/1

How can I extract a file name based on number string?

I have a list of filenames in a struct array, example:
4x1 struct array with fields:
name
date
bytes
isdir
datenum
where files.name
ans =
ts.01094000.crest.csv
ans =
ts.01100600.crest.csv
etc.
I have another list of numbers (say, 1094000). And I want to find the corresponding file name from the struct.
Please note, that 1094000 doesn't have preceding 0. Often there might be other numbers. So I want to search for '1094000' and find that name.
I know I can do it using Regex. But I have never used that before. And finding it difficult to write for numbers instead of text using strfind. Any suggestion or another method is welcome.
What I have tried:
regexp(files.name,'ts.(\d*)1094000.crest.csv','match');
I think the regular expression you'd want is more like
filenames = {'ts.01100600.crest.csv','ts.01094000.crest.csv'};
matches = regexp(filenames, ['ts\.0*' num2str(1094000) '\.crest\.csv']);
matches = ~cellfun('isempty', matches);
filenames(matches)
For a solution with strfind...
Pre-16b:
match = ~cellfun('isempty', strfind({files.name}, num2str(1094000)),'UniformOutput',true)
files(match)
16b+:
match = contains({files.name}, string(1094000))
files(match)
However, the strfind way might have issues if the number you are looking for exists in unexpected places such as looking for 10 in ["01000" "00101"].
If your filenames match the pattern ts.NUMBER.crest.csv, then in 16b+ you could do:
str = {files.name};
str = extractBetween(str,4,'.');
str = strip(str,'left','0');
matches = str == string(1094000);
files(matches)

Dynamic regexprep in MATLAB

I have the following strings in a long string:
a=b=c=d;
a=b;
a=b=c=d=e=f;
I want to first search for above mentioned pattern (X=Y=...=Z) and then output like the following for each of the above mentioned strings:
a=d;
b=d;
c=d;
a=b;
a=f;
b=f;
c=f;
d=f;
e=f;
In general, I want all the variables to have an equal sign with the last variable on the extreme right of the string. Is there a way I can do it using regexprep in MATLAB. I am able to do it for a fixed length string, but for variable length, I have no idea how to achieve this. Any help is appreciated.
My attempt for the case of two equal signs is as follows:
funstr = regexprep(funstr, '([^;])+\s*=\s*+(\w+)+\s*=\s*([^;])+;', '$1 = $3; \n $2 = $3;\n');
Not a regexp but if you stick to Matlab you can make use of the cellfun function to avoid loop:
str = 'a=b=c=d=e=f;' ; %// input string
list = strsplit(str,'=') ;
strout = cellfun( #(a) [a,'=',list{end}] , list(1:end-1), 'uni', 0).' %'// Horchler simplification of the previous solution below
%// this does the same than above but more convoluted
%// strout = cellfun( #(a,b) cat(2,a,'=',b) , list(1:end-1) , repmat(list(end),1,length(list)-1) , 'uni',0 ).'
Will give you:
strout =
'a=f;'
'b=f;'
'c=f;'
'd=f;'
'e=f;'
Note: As Horchler rightly pointed out in comment, although the cellfun instruction allows to compact your code, it is just a disguised loop. Moreover, since it runs on cell, it is notoriously slow. You won't see the difference on such simple inputs, but keep this use when super performances are not a major concern.
Now if you like regex you must like black magic code. If all your strings are in a cell array from the start, there is a way to (over)abuse of the cellfun capabilities to obscure your code do it all in one line.
Consider:
strlist = {
'a=b=c=d;'
'a=b;'
'a=b=c=d=e=f;'
};
Then you can have all your substring with:
strout = cellfun( #(s)cellfun(#(a,b)cat(2,a,'=',b),s(1:end-1),repmat(s(end),1,length(s)-1),'uni',0).' , cellfun(#(s) strsplit(s,'=') , strlist , 'uni',0 ) ,'uni',0)
>> strout{:}
ans =
'a=d;'
'b=d;'
'c=d;'
ans =
'a=b;'
ans =
'a=f;'
'b=f;'
'c=f;'
'd=f;'
'e=f;'
This gives you a 3x1 cell array. One cell for each group of substring. If you want to concatenate them all then simply: strall = cat(2,strout{:});
I haven't had much experience w/ Matlab; but your problem can be solved by a simple string split function.
[parts, m] = strsplit( funstr, {' ', '='}, 'CollapseDelimiters', true )
Now, store the last part of parts; and iterate over parts until that:
len = length( parts )
for i = 1:len-1
print( strcat(parts(i), ' = ', parts(len)) )
end
I do not know what exactly is the print function in matlab. You can update that accordingly.
There isn't a single Regex that you can write that will cover all the cases. As posted on this answer:
https://stackoverflow.com/a/5019658/3393095
However, you have a few alternatives to achieve your final result:
You can get all the values in the line with regexp, pick the last value, then use a for loop iterating throughout the other values to generate the output. The regex to get the values would be this:
matchStr = regexp(str,'([^=;\s]*)','match')
If you want to use regexprep at any means, you should write a pattern generator and a replace expression generator, based on number of '=' in the input string, and pass these as parameters of your regexprep func.
You can forget about Regex and Split the input to generate the output looping throughout the values (similarly to alternative #1) .

Why is max number ignoring two-digit numbers?

At the moment I am saving a set of variables to a text file. I am doing following to check if my code works, but whenever I use a two-digit numbers such as 10 it would not print this number as the max number.
If my text file looked like this.
tom:5
tom:10
tom:1
It would output 5 as the max number.
name = input('name')
score = 4
if name == 'tom':
fo= open('tom.txt','a')
fo.write('Tom: ')
fo.write(str(score ))
fo.write("\n")
fo.close()
if name == 'wood':
fo= open('wood.txt','a')
fo.write('Wood: ')
fo.write(str(score ))
fo.write("\n")
fo.close()
tomL2 = []
woodL2 = []
fo = open('tom.txt','r')
tomL = fo.readlines()
tomLi = tomL2 + tomL
fo.close
tomLL=max(tomLi)
print(tomLL)
fo = open('wood.txt','r')
woodL = fo.readlines()
woodLi = woodL2 + woodL
fo.close
woodLL=max(woodLi)
print(woodLL)
You are comparing strings, not numbers. You need to convert them into numbers before using max. For example, you have:
tomL = fo.readlines()
This contains a list of strings:
['tom:5\n', 'tom:10\n', 'tom:1\n']
Strings are ordered lexicographically (much like how words would be ordered in an English dictionary). If you want to compare numbers, you need to turn them into numbers first:
tomL_scores = [int(s.split(':')[1]) for s in tomL]
The parsing is done in the following way:
….split(':') separates the string into parts using a colon as the delimiter:
'tom:5\n' becomes ['tom', '5\n']
…[1] chooses the second element from the list:
['tom', '5\n'] becomes '5\n'
int(…) converts a string into an integer:
'5\n' becomes 5
The list comprehension [… for s in tomL] applies this sequence of operations to every element of the list.
Note that int (or similarly float) are rather picky about what it accepts: it must be in the form of a valid numeric literal or it will be rejected with an error (although preceding and trailing whitespace is allowed). This is why you need ….split(':')[1] to massage the string into a form that it's willing to accept.
This will yield:
[5, 10, 1]
Now, you can apply max to obtain the largest score.
As a side-note, the statement
fo.close
will not close a file, since it doesn't actually call the function. To call the function you must enclose the arguments in parentheses, even if there are none:
fo.close()

Verify and cut a string using regexp in matlab

I have the following string:
{'output',{'variable','VGRG_Pos_Var1/Parameters/D_foo'},'date',734704.60904050921}
I would like to verify the format of the string that the word 'variable' is the second word and i would like to retrive the string after the last '/' in the 3rd string (In this example 'D_foo').
how could i verify this and retrive the sting i search?
I tried the following:
regexp(str,'{''\w+'',{''variable'',''([(a-z)|(A-Z)|/|_])+')
without success
REMARK
The string to analysis is not splited after the komma, it is only due to length of the string.
EDIT
my string is:
'{''output'',{''variable'',''VGRG_Pos_Var1/Parameters/D_foo''},''date'',734704.60904050921}';
and not a cell, which could be understood. I added the sybol ' at the start and end of the string to symbolizied that it is a string.
I realise that you mention using regexp in the question, but I'm not sure if this is a requirement? If other solutions are acceptable you could try this:
str='{''output'',{''variable'',''VGRG_Pos_Var1/Parameters/D_foo''},''date'',734704.60904050921}';
parts1=textscan( str, '%s','delimiter',{',','{','}'},'MultipleDelimsAsOne',1);
parts2=textscan( parts1{1}{3}, '%s','delimiter',{'/',''''},'MultipleDelimsAsOne',1);
string=parts2{1}{end}
match=strcmp(parts1{1}{2},'variable')
To answer the first part of your question, you can write this:
str = {'output',{'variable','VGRG_Pos_Var1/Parameters/D_foo'},'date',734704.60904050921};
temp = str(2); %this holds the cell containing the two strings
if cmpstr(temp{1}(1), 'variable')
%do stuff
end
For the second part you can do this:
str = {'output',{'variable','VGRG_Pos_Var1/Parameters/D_foo'},'date',734704.60904050921};
temp = str(2); %like before, this contains the cell
temp = temp{1}(2); %this picks out the second string in the cell
temp = char(temp); %turns the item from a cell to a string
res = strsplit(temp, '/'); %splits the string where '/' are found, res is an array of strings
string = res(3); %assuming there will always be just 2 '/'s.