Check a Location/Path - c++

i'm really new on C++ and i want to try a little bit C++. Normally i'm come from Java/PHP.
I have a String like;
std::string location = "file:///C:/Program Files (x86)/Demo/";
or
std::string location = "http://www.example.com/site.php";
How i can check:
a has location the domain www.example.com or example1.com
b starts the domain with http:// or https://
In Java or PHP i would take Regular Expression. But simply no idea how to start in C++.
My first things was to check http://:
std::string location = "";
if (strncmp(location.c_str(), "http://", 7)) {
/* yepp */
} else {
/* nope */
}
But that won't work.
I hope you can help me.

I'll attack your question in three ways, from most to least specific.
1 - Instead of reinventing the wheel, you can opt for suggestions already given here on Stack Overflow:
Easy way to parse a url in C++ cross platform?
2 - Regexps are indeed fully supported in C++. You might refer to the following as a start:
http://www.cplusplus.com/reference/regex/regex_search/
3 - In general, it is not advisable to utilize C-style functions such as strncmp to compare strings. The std::string class has several substring search functions that you'd best be using. The most basic of them is the following:
http://www.cplusplus.com/reference/string/string/find/
Hope this helps you get on the right track regardless of how you choose to proceed.

Related

GoogleTagManager | Parsing URL - With or Without regex

I want to pass into a variable, the language of the user.
But, my client can't/didn't pass this information trough datalayer. So, the unique solution I've is to use the URL Path.
Indeed - The structure is:
http://www.website.be/en/subcategory/subsubcategory
I want to extract "en" information
No idea to get this - I check on Stack, on google, some people talk about regex, other ones about CustomJS, but, no result on my specific setup.
Do you have an idea how to proceed on this point ?
Many thanks !!
Ludo
Make sure the built in {{Page Path}} variable is enabled. Create a custom Javascript variable.
function() {
var parts = {{Page Path}}.split("/");
return parts[1];
}
This splits the path by the path delimiter "/" and gives you an array with the parts. Since the page path has a leading slash (I think), the first part is empty, so you return the second one (since array indexing starts with 0 the second array element has the index 1).
This might need a bit of refinement (for pages that do not start with a language signifier, if any), but that's the basic idea.
Regex is an alternative (via the regex table variable), but the above solution is a little easier to implement.

cpp check if the string is a valid domain name?

as title mentioned, is there a quick way to do that? I dont need a solid solution, anything that can differentiate, for example:
http://asdasd/
is not a valid domain name, where
http://asd.asdasd.asd
is a valid domain name.
I tried to search the solution, the closest(simple) solution is this: in python
But thats for python, I need to do in c++. Any help?
Can it be done by using "string manipulation" only? Like, substring?
I believe this can be done with libcurl.
Baring the fact that http://... is not a domain name but a URL, and that asdasd is as valid domain name if setup as a search domain (such as on local net), then purely checking for the string syntax can be done with a simple set of strncmp, strchr and strstr commands
char *str = "http://abd.xxx";
bool valid = strncmp(str,"http://",7) && str[7] && strchr(str+7,'.');
This should check that the string starts with http:// AND that there is more after the http:// and that the more after that contains a dot -- if you also want to handle where the URL contains an actual path like http://expample.com/mypath.txt, then the example become more complex, but you didn't specify if that was needed.
Alternatively, you can use regex and the pattern which you have from the python answer you point to yourself

What is the minimum set of characters I need to ban to avoid XSS

I'm writing a simple website and I appreciate my responsibility to avoid my site being used for XSS however I don't really want to spend much time on a detailed or heavy weight solution. If I was to simply ban a list of characters (that people weren't going to need to describe their favourite sausage anyway) what is the smallest list I could get away with?
Users still need the ability to write a paragraph of plain text. So I'll need to keep at least:
' " , . ; : - ( )
in the hope that some of the less grammatically challenged users can apply them accurately. I was going to start with < and > but searching indicated that, on it's own, isn't necessarily enough.
Just because you need to keep
' " , . ; : - ( )
Doesn't mean you need to keep them as those literal characters. Convert all special characters to their HTML entities (e.g. convert all < to <
You probably shouldn't just ban characters. Instead prefer to HTML escape any input before outputting it back to the user. See OWASP XSS (Cross Site Scripting) Prevention Cheat Sheet.
You haven't mentioned the server platform you're working with (.NET, Java, PHP, etc.), and each has slightly different ways of dealing with XSS. However, there are two constants:
Always validate your input against a white-list. Don't define what you won't allow, rather define what you will allow.
Always encode your output and do so for the correct language. Most platforms have libraries to do this for you (i.e. AntiXSS for ASP.NET)
More info on understanding XSS in greater depth here: OWASP Top 10 for .NET developers part 2: Cross-Site Scripting (XSS)

Which of the C++ INI (or any other format) loading libraries support multiple keys?

I'm currently using SimpleINI and I'm not sure if I can do it with this but my configuration file is going to look like this
name = someone
service = something
match = blahblahblah
match = something
match = some more junk
I know in advance which of the keys support multiple values and I want those values to be stored in an array or something so I can loop through them later (order doesn't matter).
If not SimpleIni then which other library will support this? I'm a beginner to C++ so I'm looking for something easy to use. I have boost libraries but not sure if I should use it (seems complicated).
My application is windows specific so I don't need a cross platform solution in this case.
I've already seen this question - What is the easiest way to parse an INI File in C++? but not sure which of them I can use to accomplish this.
Any suggestions?
Do you not have an option to change the names to something like match1, match2, match3, etc? That would seem to be the most straight forward way.
Beyond that, I've done things like this all the time. I simply wrote a few lines of code to parse the text file myself. It's not a complex task. But if you'd prefer to work with regular INI files, you need to look at changing the value names in the INI file.
Given you're on windows, you may not need a library at all.
You would never know it by just browsing the documentation, but GetPrivateProfileString() in the WINAPI may do exactly what you want.
My Qt solution on the other SO thread applies. It is better because
Cross platform
Easy conversion to values other than strings
Simple
If you have an ini file like this (can be auto-generated from your list of objects using Qt API)
[Matches]
1\match=1
2\match=2
3\match=3
size=3
Here is the code that read them back
QSettings settings("test.ini", QSettings::IniFormat);
int size = settings.beginReadArray("Matches");
for (int i = 0; i < size; ++i) {
settings.setArrayIndex(i);
std::cout << settings.value("match").toInt() << std::endl;
}
settings.endArray();
Of course, another obvious option will be to use comma separated string as your value and use QString::split()
SimpleINI accepts multiKey.
/** Are multiple values permitted for the same key? */
bool m_bAllowMultiKey;
[section]
name = someone
service = something
match = value1
match = othervalue
match = anotherValue
match = value4
Just create the CSimpleIniA with the second parameter as true.
// CSimpleIniA(bool a_bIsUtf8, bool a_bAllowMultiKey, bool a_bAllowMultiLine)
CSimpleIniA myINI{ false,true,false };
Use GetAllValues to get a list with all the values.
// from SimpleIni.h => typedef std::list<Entry> TNamesDepend;
CSimpleIniA::TNamesDepend values;
myINI.GetAllValues("section", "match", values);
Header file: SimpleIni.h

Regular expression for validating names and surnames?

Although this seems like a trivial question, I am quite sure it is not :)
I need to validate names and surnames of people from all over the world. Imagine a huge list of miilions of names and surnames where I need to remove as well as possible any cruft I identify. How can I do that with a regular expression? If it were only English ones I think that this would cut it:
^[a-z -']+$
However, I need to support also these cases:
other punctuation symbols as they might be used in different countries (no idea which, but maybe you do!)
different Unicode letter sets (accented letter, greek, japanese, chinese, and so on)
no numbers or symbols or unnecessary punctuation or runes, etc..
titles, middle initials, suffixes are not part of this data
names are already separated by surnames.
we are prepared to force ultra rare names to be simplified (there's a person named '#' in existence, but it doesn't make sense to allow that character everywhere. Use pragmatism and good sense.)
note that many countries have laws about names so there are standards to follow
Is there a standard way of validating these fields I can implement to make sure that our website users have a great experience and can actually use their name when registering in the list?
I would be looking for something similar to the many "email address" regexes that you can find on google.
I sympathize with the need to constrain input in this situation, but I don't believe it is possible - Unicode is vast, expanding, and so is the subset used in names throughout the world.
Unlike email, there's no universally agreed-upon standard for the names people may use, or even which representations they may register as official with their respective governments. I suspect that any regex will eventually fail to pass a name considered valid by someone, somewhere in the world.
Of course, you do need to sanitize or escape input, to avoid the Little Bobby Tables problem. And there may be other constraints on which input you allow as well, such as the underlying systems used to store, render or manipulate names. As such, I recommend that you determine first the restrictions necessitated by the system your validation belongs to, and create a validation expression based on those alone. This may still cause inconvenience in some scenarios, but they should be rare.
I'll try to give a proper answer myself:
The only punctuations that should be allowed in a name are full stop, apostrophe and hyphen. I haven't seen any other case in the list of corner cases.
Regarding numbers, there's only one case with an 8. I think I can safely disallow that.
Regarding letters, any letter is valid.
I also want to include space.
This would sum up to this regex:
^[\p{L} \.'\-]+$
This presents one problem, i.e. the apostrophe can be used as an attack vector. It should be encoded.
So the validation code should be something like this (untested):
var name = nameParam.Trim();
if (!Regex.IsMatch(name, "^[\p{L} \.\-]+$"))
throw new ArgumentException("nameParam");
name = name.Replace("'", "'"); //&apos; does not work in IE
Can anyone think of a reason why a name should not pass this test or a XSS or SQL Injection that could pass?
complete tested solution
using System;
using System.Text.RegularExpressions;
namespace test
{
class MainClass
{
public static void Main(string[] args)
{
var names = new string[]{"Hello World",
"John",
"João",
"タロウ",
"やまだ",
"山田",
"先生",
"мыхаыл",
"Θεοκλεια",
"आकाङ्क्षा",
"علاء الدين",
"אַבְרָהָם",
"മലയാളം",
"상",
"D'Addario",
"John-Doe",
"P.A.M.",
"' --",
"<xss>",
"\""
};
foreach (var nameParam in names)
{
Console.Write(nameParam+" ");
var name = nameParam.Trim();
if (!Regex.IsMatch(name, #"^[\p{L}\p{M}' \.\-]+$"))
{
Console.WriteLine("fail");
continue;
}
name = name.Replace("'", "'");
Console.WriteLine(name);
}
}
}
}
I would just allow everything (except an empty string) and assume the user knows what his name is.
There are 2 common cases:
You care that the name is accurate and are validating against a real paper passport or other identity document, or against a credit card.
You don't care that much and the user will be able to register as "Fred Smith" (or "Jane Doe") anyway.
In case (1), you can allow all characters because you're checking against a paper document.
In case (2), you may as well allow all characters because "123 456" is really no worse a pseudonym than "Abc Def".
I would think you would be better off excluding the characters you don't want with a regex. Trying to get every umlaut, accented e, hyphen, etc. will be pretty insane. Just exclude digits (but then what about a guy named "George Forman the 4th") and symbols you know you don't want like ##$%^ or what have you. But even then, using a regex will only guarantee that the input matches the regex, it will not tell you that it is a valid name.
EDIT after clarifying that this is trying to prevent XSS: A regex on a name field is obviously not going to stop XSS on its own. However, this article has a section on filtering that is a starting point if you want to go that route:
s/[\<\>\"\'\%\;\(\)\&\+]//g;
"Secure Programming for Linux and Unix HOWTO" by David A. Wheeler, v3.010 Edition (2003)
v3.72, 2015-09-19 is a more recent version.
BTW, do you plan to only permit the Latin alphabet, or do you also plan to try to validate Chinese, Arabic, Hindi, etc.?
As others have said, don't even try to do this. Step back and ask yourself what you are actually trying to accomplish. Then try to accomplish it without making any assumptions about what people's names are, or what they mean.
I don’t think that’s a good idea. Even if you find an appropriate regular expression (maybe using Unicode character properties), this wouldn’t prevent users from entering pseudo-names like John Doe, Max Mustermann (there even is a person with that name), Abcde Fghijk or Ababa Bebebe.
You could use the following regex code to validate 2 names separeted by a space with the following regex code:
^[A-Za-zÀ-ú]+ [A-Za-zÀ-ú]+$
or just use:
[[:lower:]] = [a-zà-ú]
[[:upper:]] =[A-ZÀ-Ú]
[[:alpha:]] = [A-Za-zÀ-ú]
[[:alnum:]] = [A-Za-zÀ-ú0-9]
It's a very difficult problem to validate something like a name due to all the corner cases possible.
Corner Cases
Anything anything here
Sanitize the inputs and let them enter whatever they want for a name, because deciding what is a valid name and what is not is probably way outside the scope of whatever you're doing; given the range of potential strange - and legal names is nearly infinite.
If they want to call themselves Tricyclopltz^2-Glockenschpiel, that's their problem, not yours.
A very contentious subject that I seem to have stumbled along here. However sometimes it's nice to head dear little-bobby tables off at the pass and send little Robert to the headmasters office along with his semi-colons and SQL comment lines --.
This REGEX in VB.NET includes regular alphabetic characters and various circumflexed european characters. However poor old James Mc'Tristan-Smythe the 3rd will have to input his pedigree in as the Jim the Third.
<asp:RegularExpressionValidator ID="RegExValid1" Runat="server"
ErrorMessage="ERROR: Please enter a valid surname<br/>" SetFocusOnError="true" Display="Dynamic"
ControlToValidate="txtSurname" ValidationGroup="MandatoryContent"
ValidationExpression="^[A-Za-z'\-\p{L}\p{Zs}\p{Lu}\p{Ll}\']+$">
This one worked perfectly for me in JavaScript:
^[a-zA-Z]+[\s|-]?[a-zA-Z]+[\s|-]?[a-zA-Z]+$
Here is the method:
function isValidName(name) {
var found = name.search(/^[a-zA-Z]+[\s|-]?[a-zA-Z]+[\s|-]?[a-zA-Z]+$/);
return found > -1;
}
Steps:
first remove all accents
apply the regular expression
To strip the accents:
private static string RemoveAccents(string s)
{
s = s.Normalize(NormalizationForm.FormD);
StringBuilder sb = new StringBuilder();
for (int i = 0; i < s.Length; i++)
{
if (CharUnicodeInfo.GetUnicodeCategory(s[i]) != UnicodeCategory.NonSpacingMark) sb.Append(s[i]);
}
return sb.ToString();
}
This somewhat helps:
^[a-zA-Z]'?([a-zA-Z]|\.| |-)+$
This one should work
^([A-Z]{1}+[a-z\-\.\']*+[\s]?)*
Add some special characters if you need them.