R: Countrycode package not supporting regex as the origin - regex

I have a list of countries that i need to convert into standardized format (iso3c). Some have long names, others have 2 or 3 digit codes, and others do not display the whole country name like "Africa" instead of "South Africa". Ive done some research and come up to use countrycode package in R. However, when i tried to use "regex" R doesnt seem to recognize it. Im getting the error below:
> countrycode(data,"regex","iso3c", warn = TRUE)
Error in countrycode(data, "regex", "iso3c", :
Origin code not supported
Any other option I need to do?
Thanks!

You can view the README for the countrycode package here https://github.com/vincentarelbundock/countrycode, or you can pull up the help file in R by entering this into your R console ?countrycode::countrycode.
"regex" is not a valid 'origin' value (2nd argument in the countrycode() function). You must use one of "cowc", "cown", "eurostat", "fao", "fips105", "imf", "ioc", "iso2c", "iso3c", "iso3n", "p4_ccode", "p4_scode", "un", "wb", "wb_api2c", "wb_api3c", "wvs", "country.name", "country.name.de" (using latest version 0.19).
If you use either of the following 'origin' values, regex matching will be performed automatically: "country.name" or "country.name.de"
If you're using a custom dictionary with the new (as of version 0.19) custom_dict argument, you must set the origin_regex argument to TRUE for regex matching to occur.
In your example, this should do what you want:
countrycode(data, origin = "country.name", destination = "iso3c", warn = TRUE)

Related

How can I fix this error in my go project - undefined: bson.RegEx

I get the following error from my editor:
undefined: bson.RegEx
due to this line of code in my go project:
regex := bson.M{"$regex": bson.RegEx{Pattern: id, Options: "i"}}
Why am I getting this error and how can I resolve it?
I've made sure that I'm importing:
"go.mongdb.org/mongo-driver/bson"
I've also checked inside bson/primitive/primitive.go to see that RegEx does exist
Using version 1.1.0 of mongo-driver.
Managed to work around the problem by removing this:
regex := bson.M{"$regex": bson.RegEx{Pattern: id, Options: "i"}}
and add this instead:
regex := `(?i).*` + name + `.*`
filter = bson.M{"name": bson.M{"$regex": regex}}
Why am I getting this error and how can I resolve it?
Using mongo-go-driver v1+, you can utilise bson.primitive. For example:
patternName := `.*` + name + `.*`
filter := bson.M{"name": primitive.Regex{Pattern: patternName, Options:"i"}}
cursor, err := collection.Find(context.TODO(), filter)
This is imported from "go.mongodb.org/mongo-driver/bson/primitive".
In addition, I would also suggest to consider the search pattern. You can optimise a regex search if the regular expression is a “prefix expression”, which means that all potential matches start with the same string. For example, ^name.* will be optimised by matching only against the values from the index that starts with name.
Also worth noting that case insensitive regular expression queries generally cannot use indexes effectively. The $regex implementation is not collation-aware and is unable to utilise case-insensitive indexes. Please see $regex index use for more information.
Depending on the use case, consider MongoDB Text Search. For example, you can create a text index:
db.collection.createIndex({"name":"text"});
Which then you can search using:
filter := bson.M{"$text": bson.M{"$search": name}}
cur, err := collection.Find(context.TODO(), filter)
Also worth mentioning depending on your requirements, there's also MongoDB Atlas Full Text Search feature for advanced search functionality. i.e. text analysers.

how to use $set, $inc etc. in c++ mongodb driver

I was wondering how to use $set $inc in c+ mongodb driver. I can only call update to update the whole json string into the db like this: _db_conn.update(db_name_str, mongo::Query(key_word), mongo::fromjson(json_str), true);
Are there any way to update partially, using $set, $inc option?
when I check it on the internet, I found a similar solution in java, but I can't find it in c++ version documents...
WriteResult result = mongoNsTemplate.getCollection("userStore").update(query.getQueryObject(), new BasicDBObject("$set", dbObject), true, false);
any help would be appreciated.
You can use $set, $inc or other operations. The document (json_str above) must be a document with update operator modifiers like $set or $inc instead of a full document. So if json_str contains something like:
string json_str = "{'$set': {'field1': 1}, '$inc': {'field2': 1}}";
Calling _db_conn.update(db_name_str, mongo::Query(key_word), mongo::fromjson(json_str), true); will:
update the first matching document (or insert a new one if not found (upsert is true)
set field1 to 1 (it will be added if it is not found)
increment field2 by 1 (or set to 1 if not found, or cause an error if found but not a numerical value)
This is equivalent to using the mongo shell.

Django query Unicode Issues

EDIT #2:
{'sql': 'SELECT "strains_terpene"."id", "strains_terpene"."name",
"strains_terpene"."short_desc", "strains_terpene"."long_desc",
"strains_terpene"."aroma", "strains_terpene"."flavor",
"strains_terpene"."effects" FROM "strains_terpene" WHERE
"strains_terpene"."name" = \'\xce±-Humulene\'', 'time': '0.000'}
Upon closer look it appears that django may be properly escaping the single quotes in the end. Had to take a different angle to see this by using this:
from django.db import connections
connections['default'].queries
So now the question remains, why even though python3, django, and postgres are all set to utf-8 is the unicode being encoded to local in the query?
Original Question:
Here is the runtime error:
strains.models.DoesNotExist: Terpene matching query does not exist.
Here is the str(Terpene.objects.filter(name='β-Caryophyllene').query):
SELECT "strains_terpene"."id", "strains_terpene"."name", "strains_terpene"."short_desc", "strains_terpene"."long_desc", "strains_terpene"."aroma", "strains_terpene"."flavor", "strains_terpene"."effects"
FROM "strains_terpene"
WHERE "strains_terpene"."name" = ß-Caryophyllene
Here is how postgres likes to see the query for it to work:
select * from strains_terpene where name = 'β-Caryophyllene'
Am i missing something here? Why is Django not wrapping my condition in single quotes?
PostgresDB is encoded with utf-8
Python 3 strings are unicode
EDIT:
I notice the query attribute is also converting the β to ß...
I thought this could be a conversion issue considering im using windows cmd for the python shell.
So i did a:
with open('log2.txt','w',encoding='utf-8') as f:
print(Terpene.objects.filter(name='β-Caryophyllene').query, file=f)
And here are the results even when output directly to utf-8 plain text.
SELECT "strains_terpene"."id", "strains_terpene"."name", "strains_terpene"."short_desc", "strains_terpene"."long_desc", "strains_terpene"."aroma", "strains_terpene"."flavor", "strains_terpene"."effects"
FROM "strains_terpene"
WHERE "strains_terpene"."name" = ß-Caryophyllene
So now I am confused on 2 fronts. Why does django choose to ommit the single quotes for the where condition and why is the lowercase beta being converted to an uppercase?
EXTRA INFO:
Here is the section of actual code.
Importing mass results via CSV.
The results dict stores the mapping between columns and Terpene Names
The first log.txt is for verifying the contents of results
The second log1.txt is to verify the key before using it as the lookup condition
The finally log2.txt verifies sql being sent to the database
First the Code Snippet:
results = {
u'α-Pinene': row[7],
u'β-Pinene': row[8],
u'Terpinolene': row[9],
u'Geraniol': row[10],
u'α-Terpinene': row[11],
u'γ-Terpinene': row[12],
u'Camphene': row[13],
u'Linalool': row[14],
u'd-Limonene': row[15],
u'Citral': row[16],
u'Myrcene': row[17],
u'α-Terpineol': row[18],
u'Citronellol': row[19],
u'dl-Menthol': row[20],
u'1-Borneol': row[21],
u'2-Piperidone': row[22],
u'β-Caryophyllene': row[23],
u'α-Humulene': row[24],
u'Caryophyllene Oxide': row[25],
}
with open("log.txt", "w") as text_file:
print(results.keys(), file=text_file)
for r, v in results.items():
if '<' not in v:
value = float(v.replace("%", ""))
with open("log1.txt", "w") as text2:
print(r, file=text2)
with open("log2.txt", "w", encoding="utf-8") as text3:
print(Terpene.objects.filter(name=r).query, file=text3)
TerpeneResult.objects.create(
terpene=Terpene.objects.get(name=r),
qa_sample=sample,
result=value,
)
And log.txt -- results.keys():
dict_keys(['dl-Menthol', 'Geraniol', 'Camphene', '1-Borneol', 'Linalool',
'α-Humulene', 'Caryophyllene Oxide', 'β-Caryophyllene', 'Citronellol',
'α-Pinene', '2-Piperidone', 'β-Pinene', 'd-Limonene', 'γ-Terpinene',
'Terpinolene', 'α-Terpineol', 'Myrcene', 'α-Terpinene', 'Citral'])
log1.txt -- α-Humulene
Lastly the sql being generated -- log2.txt:
SELECT "strains_terpene"."id", "strains_terpene"."name", "strains_terpene"."short_desc", "strains_terpene"."long_desc", "strains_terpene"."aroma", "strains_terpene"."flavor", "strains_terpene"."effects"
FROM "strains_terpene"
WHERE "strains_terpene"."name" = α-Humulene
Note the unicode being lost at the last moment when the sql is generated.

Regex JSON response Gatling stress tool

Wanting to capture a variable called scanNumber in the http response loking like this:
{"resultCode":"SUCCESS","errorCode":null,"errorMessage":null,"profile":{"fullName":"TestFirstName TestMiddleName TestLastName","memberships":[{"name":"UA Gold Partner","number":"123-456-123-123","scanNumber":"123-456-123-123"}]}}
How can I do this with a regular experssion?
The tool I am using is Gatling stress tool (with the Scala DSL)
I have tried to do it like this:
.check(jsonPath("""${scanNumber}""").saveAs("scanNr")))
But I get the error:
---- Errors --------------------------------------------------------------------
> Check extractor resolution crashed: No attribute named 'scanNu 5 (100,0%)
mber' is defined
You were close first time.
What you actually want is:
.check(jsonPath("""$..scanNumber""").saveAs("scanNr")))
or possibly:
.check(jsonPath("""$.profile.memberships[0].scanNumber""").saveAs("scanNr")))
Note that this uses jsonPath, not regular expressions. JsonPath should more reliable than regex for this.
Check out the JsonPath spec for more advanced usage.
use this regex to match this in anywhere in json:
/"scanNumber":"[^"]+"/
and if you want to match just happens in structure you said use:
/\{[^{[]+\{[^{[]+\[\{[^{[]*("scanNumber":"[^"]+")/
Since json fields may change its order you should make your regex more tolerant for those changes:
val j = """{"resultCode":"SUCCESS","errorCode":null,"errorMessage":null,"profile":{"fullName":"TestFirstName TestMiddleName TestLastName","memberships":[{"name":"UA Gold Partner","number":"123-456-123-123","scanNumber":"123-456-123-123"}]}}"""
val scanNumberRegx = """\{.*"memberships":\[\{.*"scanNumber":"([^"]*)".*""".r
val scanNumberRegx(scanNumber) = j
scanNumber //String = 123-456-123-123
This will work even if the json fields will be in different order (but of course keep the structure)

How to use regex in selenium locators

I'm using selenium RC and I would like, for example, to get all the links elements with attribute href that match:
http://[^/]*\d+com
I would like to use:
sel.get_attribute( '//a[regx:match(#href, "http://[^/]*\d+.com")]/#name' )
which would return a list of the name attribute of all the links that match the regex.
(or something like it)
thanks
The answer above is probably the right way to find ALL of the links that match a regex, but I thought it'd also be helpful to answer the other part of the question, how to use regex in Xpath locators. You need to use the regex matches() function, like this:
xpath=//div[matches(#id,'che.*boxes')]
(this, of course, would click the div with 'id=checkboxes', or 'id=cheANYTHINGHEREboxes')
Be aware, though, that the matches function is not supported by all native browser implementations of Xpath (most conspicuously, using this in FF3 will throw an error: invalid xpath[2]).
If you have trouble with your particular browser (as I did with FF3), try using Selenium's allowNativeXpath("false") to switch over to the JavaScript Xpath interpreter. It'll be slower, but it does seem to work with more Xpath functions, including 'matches' and 'ends-with'. :)
You can use the Selenium command getAllLinks to get an array of the ids of links on the page, which you could then loop through and check the href using the getAttribute, which takes the locator followed by an # and the attribute name. For example in Java this might be:
String[] allLinks = session().getAllLinks();
List<String> matchingLinks = new ArrayList<String>();
for (String linkId : allLinks) {
String linkHref = selenium.getAttribute("id=" + linkId + "#href");
if (linkHref.matches("http://[^/]*\\d+.com")) {
matchingLinks.add(link);
}
}
A possible solution is to use sel.get_eval() and write a JS script that returns a list of the links. something like the following answer:
selenium: Is it possible to use the regexp in selenium locators
Here's some alternate methods as well for Selenium RC. These aren't pure Selenium solutions, they allow interaction with your programming language data structures and Selenium.
You can also get get HTML page source, then regular expression the source to return a match set of links. Use regex grouping to separate out URLs, link text/ID, etc. and you can then pass them back to selenium to click on or navigate to.
Another method is get HTML page source or innerHTML (via DOM locators) of a parent/root element then convert the HTML to XML as DOM object in your programming language. You can then traverse the DOM with desired XPath (with regular expression or not), and obtain a nodeset of only the links of interest. From their parse out the link text/ID or URL and you can pass back to selenium to click on or navigate to.
Upon request, I'm providing examples below. It's mixed languages since the post didn't appear to be language specific anyways. I'm just using what I had available to hack together for examples. They aren't fully tested or tested at all, but I've worked with bits of the code before in other projects, so these are proof of concept code examples of how you'd implement the solutions I just mentioned.
//Example of element attribute processing by page source and regex (in PHP)
$pgSrc = $sel->getPageSource();
//simple hyperlink extraction via regex below, replace with better regex pattern as desired
preg_match_all("/<a.+href=\"(.+)\"/",$pgSrc,$matches,PREG_PATTERN_ORDER);
//$matches is a 2D array, $matches[0] is array of whole string matched, $matches[1] is array of what's in parenthesis
//you either get an array of all matched link URL values in parenthesis capture group or an empty array
$links = count($matches) >= 2 ? $matches[1] : array();
//now do as you wish, iterating over all link URLs
//NOTE: these are URLs only, not actual hyperlink elements
//Example of XML DOM parsing with Selenium RC (in Java)
String locator = "id=someElement";
String htmlSrcSubset = sel.getEval("this.browserbot.findElement(\""+locator+"\").innerHTML");
//using JSoup XML parser library for Java, see jsoup.org
Document doc = Jsoup.parse(htmlSrcSubset);
/* once you have this document object, can then manipulate & traverse
it as an XML/HTML node tree. I'm not going to go into details on this
as you'd need to know XML DOM traversal and XPath (not just for finding locators).
But this tutorial URL will give you some ideas:
http://jsoup.org/cookbook/extracting-data/dom-navigation
the example there seems to indicate first getting the element/node defined
by content tag within the "document" or source, then from there get all
hyperlink elements/nodes and then traverse that as a list/array, doing
whatever you want with an object oriented approach for each element in
the array. Each element is an XML node with properties. If you study it,
you'd find this approach gives you the power/access that WebDriver/Selenium 2
now gives you with WebElements but the example here is what you can do in
Selenium RC to get similar WebElement kind of capability
*/
Selenium's By.Id and By.CssSelector methods do not support Regex and By.XPath only does where XPath 2.0 is enabled. If you want to use Regex, you can do something like this:
void MyCallingMethod(IWebDriver driver)
{
//Search by ID:
string attrName = "id";
//Regex = 'a number that is 1-10 digits long'
string attrRegex= "[0-9]{1,10}";
SearchByAttribute(driver, attrName, attrRegex);
}
IEnumerable<IWebElement> SearchByAttribute(IWebDriver driver, string attrName, string attrRegex)
{
List<IWebElement> elements = new List<IWebElement>();
//Allows spaces around equal sign. Ex: id = 55
string searchString = attrName +"\\s*=\\s*\"" + attrRegex +"\"";
//Search page source
MatchCollection matches = Regex.Matches(driver.PageSource, searchString, RegexOptions.IgnoreCase);
//iterate over matches
foreach (Match match in matches)
{
//Get exact attribute value
Match innerMatch = Regex.Match(match.Value, attrRegex);
cssSelector = "[" + attrName + "=" + attrRegex + "]";
//Find element by exact attribute value
elements.Add(driver.FindElement(By.CssSelector(cssSelector)));
}
return elements;
}
Note: this code is untested. Also, you can optimize this method by figuring out a way to eliminate the second search.