RGoogleAnalytics replacing unexpectedly escaped characters with gsub

RGoogleAnalytics replacing unexpectedly escaped characters with gsub - regex

I'm using RGoogleAnalytics, I'm just at the learning stage at the moment.
I'm following the code in the tutorial here https://code.google.com/p/r-google-analytics/
But when I try to run
ga.goals <- conf$GetGoals()
ga.goals
I get an error message telling me there is an unexpected escaped character '\.' at pos 7
I get a similar message for the next two lines of code (GetSegments)
This question deals with a similar problems in the Facebook Graphs API
How to replace "unexpected escaped character" in R
I've tried using a similar bit of code
confGoalsSub <- gsub('\\.', ' ', conf$GetGoals())
to remove the escaped characters, but I get another error :
cannot coerce type 'closure' to vector of type 'character'
Out of desperation I have tried confGoalsSub <- gsub('\\.', ' ', conf) which returns a character vector that is just garbage (it's just the code for conf with the decimal points stripped out).
Can anyone suggest a better expression than gsub that will return a useful object?
EDIT: As per the suggestion below I've now added the brackets at the end of the function call but I still get the same error message about unexpected escape characters. I get the same error when I try to call other, similar function such as $GetSegments().
I saw on one video at the weekend that this package was broken for a long time, although the speaker did not provide details as to why. Perhaps I should give up and try one of the other Google Analytics packages in R.
Seems odd, given that this one is supposed to be Google supported.

I think this error arises when the RJSON library isn't able to parse the Google Analytics Data Feed properly and convert it into a nested list. The updated version of [RGoogleAnalytics] (http://cran.r-project.org/web/packages/RGoogleAnalytics/index.html) fixes this problem. Currently, you won't be able to retrieve Goals and Segments from your Google Analytics account using the library but beyond that it supports the full range of dimensions and metrics.

Related

OpenModelica SimulationOptions 'variableFilter' not working with '^' exceptions

To reduce size of my simulation output files, I want to give variable name exceptions instead of a list of many certain variables to the simulationsOptions/outputFilter (cf. OpenModelica Users Guide / Output) of my model. I found the regexp operator "^" to fullfill my needs, but that didn't work as expected. So I think that something is wrong with the interpretation of connected character strings when negated.
Example:
When I have any derivatives der(...) in my model and use variableFilter=der.* the output file will contain all the filtered derivatives. Since there are no other varibles beginning with character d the same happens with variableFilter=d.*. For testing I also tried variableFilter=rde.* to confirm that every variable is filtered.
When I now try to except by variableFilter=^der.*, =^rde.* or =^d.*, I get exactly the same result as without using ^. So the operator seems to be ignored in this notation.
When I otherwise use variableFilter=[^der].*, =[^rde].* or even =[^d].*, all wanted derivation variables are filtered from the ouput, but there is no difference between those three expressions above. For me it seems that every character is interpretated standalone and not as as a connected string.
Did I understand and use the regexp usage right or could this be a code bug?
Side/follow-up question: Where can I officially report this for software revision?
_
OpenModelica v.1.19.2 (64-bit)

Can I change the regex on NagVis and if yes, how can I do this?

I have a problem with Nagvis. There I created several maps with the locations of hosts and used the service lines to display the bandwidth and utilization of individual interfaces. It all worked well until we eventually switched to CheckMK 2.0. We have renamed the interfaces and theoretically it would not be a problem to simply transfer the new names to NagVis.
However, the regex error mentioned below occurs. I also checked the new label with the regex using regex101 and found that the label has changed. It is structured according to the pattern: 'Interface_Name "Interface description"'. Nagvis's regex doesn't allow quotes, and thus neither does the name of the interface.
I'm relatively new to this and haven't had much to do with it before. One solution would be to escape the quotation marks, but I don't know where to do that. If you have any suggestions for a solution, I would be very grateful.
If you have any questions, just ask.
CMK version: 2.0.0p26
OS version: Windows 10
Error message: The attribute has the wrong format (Regex: /^[0-9a-zа-яё\p{L}\s:+_.,'-*?!##=/]+ $/u).

Google data studio : Error in formula for custom fields with REGEXP_MATCH

I'm currently playing with Google Data Studio and I'm having an error that I can't get rid of.
I'm trying to create a custom field that will store some values depending on the result of my regex, see below the code :
GDS is not accepting this formula as I'm getting the error : Invalid Formula.
The documentation about REGEXP_MATCH is also saying that it returns true or false but when I just get the return of the regex '.' (looking for any character), I do not get any of these values. Instead it shows me {$theCharacterFound} ex : {A}.
Hope someone will be able to tell me what I am doing wrong !
EDIT : I found out in this topic that it is apparently a problem with the postgreSQL connector (that I'm using) so we can only hope that Google will fix it...

I think you are missing an r before the regex literal:
CASE
WHEN REGEXP_MATCH(my_field_text, r'\bWord1\b') THEN 'True'
WHEN REGEXP_MATCH(my_field_text, r'\bWord2\b') THEN 'False'
ELSE NULL
END
Note also that I placed word boundaries around your search term words. This will prevent Word1 from matching a substring in a large string, e.g. AWord1s, which you might not want to count as a match.

How to find and replace box character in text file?

I have a large text file that I'm going to be working with programmatically but have run into problems with a special character strewn throughout the file. The file is way too large to scan it looking for specific characters. Most of the other unwanted special characters I've been able to get rid of using some regex pattern. But there is a box character, similar to "□". When I tried to copy the character from the actual text file and past it here I get "�", so the example of the box is from Windows character map which includes the code 'U+25A1', which I'm not sure how to interpret or if it's something I could use for a regex search.
Would anyone know how I could search for the box symbol similar to "□" in a UTF-8 encoded file?
EDIT:
Here is an example from the text file:
"� Prune palms when flower spathes show, or delay pruning until after the palm has finished flowering, to prevent infestation of palm flower caterpillars. Leave the top five rows."
The only problem is that, as mentioned in the original post, the square gets converted into a diamond question mark.

It's unclear where and how you are searching, although you could use the hex equivalent:
\x{25A1}
Example:
https://regex101.com/r/b84oBs/1

The black diamond with a question mark is not a character, per se. It is what a browser spits out at you when you give it unrecognizable bytes.
Find out where that data is coming from.
Determine its encoding. (Usually UTF-8, but might be something else.)
Be sure the browser is configured to display that encoding. This is likely to suffice <meta charset=UTF-8> in the header of the page.

I found a workaround using Notepad++ and this website. It's still not clear what encoding system the square is originally from, but when I post it into the query field in the website above or into the Notepad++ Conversion Table (Plugins > Converter > Conversion Table) it gives the hex-character code for the "Replacement Character" which is the diamond with the question mark.
Using this code in a regex expression, \x{FFFD}, within Notepad++ search gave me all the squares, although recognizing them as the Replacement Character.

Are my regex just wrong or is there a buggy behaviour in td-agent's format behaviour?

I am using fluentd, elasticsearch and kibana to organize logs. Unfortunately, these logs are not written using any standard like apache, so I had to come up with the regex for the format myself. I used this site here to verify that they are working: http://fluentular.herokuapp.com/ .
The logs have roughly this format here:
DEBUG: 24.04.2014 16:00:00 [SingleActivityStrategy] Start Activitiy 'barbecue' zu verabeiten.
the format regex I am using is as follows:
format /(?<pri>([INFO]|[DEBUG]|[ERROR])+)...(?<date>(\d{2}\.\d{2}\.\d{4})).(?<time>(\d{2}:\d{2}:\d{2})).\[(?<subject>(.*))\].(?<msg>(.*))/
Now, judging by that website that is supposed to test specifically fluentd's behaviour with regexes, the output SHOULD be this one:
Record
Key Value
pri DEBUG
date 24.04.2014
subject SingleActivityStrategy
msg Start Activitiy 'barbecue' zu verabeiten.
Instead though, I have this ?bug? that pri is always shortened to DEBU. Same for ERROR which becomes ERRO, only INFO stays INFO. I am not very experienced with regular expressions and I find it hard to believe that this is a bug, still it confuses me and any help is greatly appreciated.
I'm not sure I can link the complete config file because I dont personally own these log files and I am trying to keep it on a level that my boss won't get mad at me for posting sensitive information, but should it definately be needed, I will post them later on after having asked him how much I can reveal.
In general, the logs always look roughly like this:
First the priority, which is either DEBUG, ERROR or INFO, next the date , next what we call the subject which is always written in [ ] and finally just a message.
Here is a link to fluentular with the format I am using and a teststring that produces the right result in fluentular, but not in my config file:
Fluentular
Sorry I couldn't make it work like a regular link to just click on.
Another link to test out regex with my format and test string is this one:
http://rubular.com/r/dfXOkQYNXP
tl;dr version:
my td-agent format regex cuts off the last letter, although fluentular says it shouldn't. My fault or a bug?

How the regex would look if you're trying to match the data specifically:
(INFO|DEBUG|ERROR)\:\s+(\d{2}\.\d{2}\.\d{4})\s(\d{2}:\d{2}:\d{2})\s\[(.*)\](.*)
In your format string, you were using . and ... for where your spaces and colon should be. I'm not to sure on why this works in Fluentular, but you should have matched the \: explicitly and each space between the values.
So you'd be looking at the following regular expression with the Fluentd fields (which are grouping names):
(?<pri>(INFO|ERROR|DEBUG))\:\s+(?<date>(\d{2}\.\d{2}\.\d{4}))\s(?<time>(\d{2}:\d{2}:\d{2}))\s\[(?<subject>(.*))\]\s(?<msg>(.*))
Meaning your td-agent.conf should look like:
<source>
type tail
path /var/log/foo/bar.log
pos_file /var/log/td-agent/foo-bar.log.pos
tag foo.bar
format /(?<pri>(INFO|ERROR|DEBUG))\:\s+(?<date>(\d{2}\.\d{2}\.\d{4}))\s(?<time>(\d{2}:\d{2}:\d{2}))\s\[(?<subject>(.*))\]\s(?<msg>(.*))/
</source>
I would also take a look into comparing Logstash vs. Fluentd. I like Logstash far more because you create Grok filters to match the type of data you want, and it makes formatting your fields much easier because you are providing an abstraction layer, but you essentially will get the same data.
And I would watch out when you're using sites like Rubular, as they are fairly particular about multi-line matching and the like. I'd suggest something like Regexr which gives immediate feedback and you can set global and multiline matching as well.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

RGoogleAnalytics replacing unexpectedly escaped characters with gsub - regex

Related

OpenModelica SimulationOptions 'variableFilter' not working with '^' exceptions

Can I change the regex on NagVis and if yes, how can I do this?

Google data studio : Error in formula for custom fields with REGEXP_MATCH

How to find and replace box character in text file?

Are my regex just wrong or is there a buggy behaviour in td-agent's format behaviour?

Categories

Resources