Laravel: multi-language, display a word in two languages together in the same <div> - laravel-5.5

I have a list of words translated in a few languages in json files in lang folder. How can I get, for instance, both the French and Spanish translated strings together in a ?

Related

Ignore tags and javascript with regex

I'm trying to perform a regex replacement on the HTML below. I'm using an existing (I didn't write it and don't really understand it) regex pattern that ignores anything inside of an HTML tag, but I need it to also ignore anything between script tags. The pattern is (?<!<[^>]*)(diversity|and|inclusion). The problem is that the and in 'playerBrandingId' in the javascript is getting matched and ultimately replaced. In case it matters, I'm using C#. You can see what I get here.
<p>When it comes to building more diverse and inclusive workforces, the sports industry is already a leader, but it can do much more. One of the ways SBD/SBJ is focusing on diversity and inclusion is by talking to business leaders about what the industry can do better. In our first video in the “SBJ Diversity and Inclusion” series, we hear from execs working in leagues, technology, recruitment and academia.</p>
<div class="article-offset-block article-video article-offset-block--half">
<div class="u-vr2">
<div id='video-F17F523A70EB43ECAF54DF46144835B4'></div>
</div>
</div>
<script>
var playerParam = {
'pcode': 'poeXI63BtIsR_ugBoy3Z6X8KfiMo',
'playerBrandingId': 'video-F17F523A70EB43ECAF54DF46144835B4',
'autoplay': false,
'loop': false
};
OO.ready(function () { window.ppF17F523A70EB43ECAF54DF46144835B4 = OO.Player.create('video-F17F523A70EB43ECAF54DF46144835B4', 'w5cW9qZTE6qRRDqfBdi861XWJTXci9uE', playerParam); });
</script>
EDIT:
The pattern is generated by a user's query, so the pattern could include the word window or player which would be matched in the javascript when I change the pattern to include the \b like so: (?<!<[^>]*)\b(window|player|and)\b
Another example
Change your regex to (?<!<[^>]*)\b(diversity|and|inclusion)\b The \b adds a test for a word boundary. forcing each word inside the ( and ) to be whole words.
EDIT:
You are trying to parse the HTML to extract the text nodes then check them,
you should not under any circumstances try to parse HTML with a regex unless you wish to invoke rite 666 Ph'nglui mglw'nafh Cthulhu R'lyeh wgah'nagl fhtagn.
Use an HTML parsing library see this page for some ways to do it or search for extracting text nodes from HTML with .NET and C#
The answer is that you cannot do what I'm trying to do with Regex according to this.

Using wildcards in Regex

I am parsing strings in an html page, and I can get multiple matches for specific strings. I am trying to identify when the strings come after a specific word(s) in the text so I can reject them.
For instance say I am trying to extract a phone # from a page. There may be a few but I don't want the one that comes after "Copyright". Since this can be constructed any way and since the #s I want will come before I wanted to do something like (realizing this is a totally imperfect phone # just using as example)
((Copyright|©)(*))?([0-9]\d{2,3}(-)[0-9]\d{2,3}(-)[0-9]\d{3,4})
I get the * is not the correct way to do wildcards but the larger question is how can I set this up so when capturing a phone # I also capture Copyright if it comes before it anywhere which would include:
Copyright 1972 Acme Corp 555-555-5555
and
Copyright held by Acme Corp
123 West Street
NY, NY 10019
Bla bla
questions call us at 555-555-5555
Ideally what I want to capture is 'Copyright' and '555-555-5555' w/o the wildcard text between. This way any phone #s I capture with Copyright I can reject.
Somewhat OT I understand I could also do something like
(?P<Copyright>(Copyright|Trademark|©))(?P<Wildcard>(*))(?P<NUMBER>([0-9]\d{2,3}(-)[0-9]\d{2,3}(-)[0-9]\d{3,4}))
to make identification easier later on.
In any event my goal is the easiest way to identify after the fact a phone number that occurs at any point in the htmnl after the term copyright so I can reject it.
This type of information extraction problem will be extremely difficult (if not impossible) to solve using only regular expressions.
If at all possible, you should pre-process your document before attempting to extract the phone numbers.
Some things to consider:
strip all HTML markup (ie. remove all mark-up tags and replace with space)
convert & normalize all white-space
The resulting text could then be matched using a regular expression.
Here is an example of what this pre-processing step would do to a document:
<html>
<head>
</head>
<body>
<p style="some css style etc">some <em>arbitrary</em> text here.</p>
<div>
<div>
More complex html nested
tags
</div>
with arbitrary white space including tabs and
new lines.
</div>
<footer class="footer_class">
<p style="css style">Copyright (c) Acme Coropration</p>
<p style="css style">123 West Street<br/>NY, NY 10019<br/>Bla bla</p>
<p style="some other css style">question call us at 555-555-5555</p>
</footer>
</body>
</html>
After pre-processing:
some arbitrary text here. More complex html nested tags with arbitrary white
space including tabs and new lines. Copyright (c) Acme Corporation 123 West
Street NY, NY 10019 Bla Bla questions call us at 555-555-5555
Notice that this way you get a solid block of text. You may want to design some rules for breaking this single-line text block into multiple lines in order to make it easier to recognize when the information you're searching for is connected with certain keywords.
You could also look at the distance between a keyword and the information you're looking for and use that as a heuristic as well.

How can I use Regular Expression to replace b with respective ascii character?

I wrote a VB .Net application that asks the user for a URL, then the application will pull the HTML content of that URL and filters out most stuff except for anything between <td> </td> tags.
So if the HTML of that url is something like this
<html><body><table><tr><td>My content here</td></tr></table>
</body>
</html>
then the application will simply print out:
My content here
However, the problem is some URLs have populated these <td></td> with the ascii codes of the letters rather than letters themselves, so here is an example:
<html><body><table><tr><td>">bandit at</td></tr></table>
</body>
</html>
so my program will display:
'bandit'
but any browser will display the above as
bandit
I tried to use RegEx to replace those numbers to their respective characters (using 'Chr' function), but I failed.
Here is what I tried:
Me.TextBox3.Text = Regex.Replace(htmlDoc, "&#\d\d\d;", chr("$&"))
but that presents an error.
My question is: how can I replace all occurences of &#\d\d\d; with Chr(value of the \d\d\d that was matched earlier) ?
This one can be achieved easily....by using the HTMLDecode method.
http://social.msdn.microsoft.com/Forums/vstudio/en-US/5cd2251d-1359-49ce-b6a2-7ca492d560a5/converting-nbsp-when-using-serverurldecode?forum=csharpgeneral
string subject = HttpUtility.HtmlDecode(HttpUtility.UrlDecode(Request.QueryString["subject"]));
this is c#, but you can easily convert this to vb.net.
You can use HttpUtility.HtmlDecode to decode html into plain string.

HTML and Attribute encoding

I came across a post on Meta SO and I'm curious about what are the subtle differences between un-encoded and encoded HTML characters, in HTML attributes, in contexts of: security, best-practice and browser support.
HTML encoding replaces certain characters that are semantically meaningful in HTML markup, with equivalent characters that can be displayed to the user without affecting parsing the markup.
The most significant and obvious characters are <, >, &, and " which are are replaced with <, >, &, and ", respectively. Additionally, an encoder may replace high-order characters with the equivalent HTML entity encoding, so content can be preserved and properly rendered even in the event the page is sent to the browser as ASCII.
HTML attribute encoding, on the other hand, only replaces a subset of those characters that are important to prevent a string of characters from breaking the attribute of an HTML element. Specifically, you'd typically just replace ", &, and < with ", &, and <. This is because the nature of attributes, the data they contain, and how they are parsed and interpreted by a browser or HTML parser is different than how an HTML document and its elements are read.
In terms of how that relates to XSS, you want to properly sanitize strings from an outside source (such as the user) so they don't break your page, or more importantly, inject markup and script that can alter or destroy your application or affect your users' machines (by taking advantage of browser or platform vulnerabilities).
If you want to display user-generated content in your page, you'd HTML encode the string and then display it in your markup, and everything they entered will be displayed literally without worrying XSS or broken markup.
If you needed to attach user-generated content to an element in an attribute (for example, a tooltip on a link), you'd attribute encode to make sure the content doesn't break the element's markup.
Could you just use the same function for HTML encoding to handle attribute encoding? Technically, yes. In the case of the meta question you linked, it sounds like they were taking HTML that was encoded and decoding it, then using that result as an attribute value, which results in encoded markup being displayed literally, if you follow.
I would recommend looking over OWASP XSS Prevention Rules 1 and 2.
A brief summary...
Rule 1 for HTML
Escape the following characters with HTML entity encoding ...
& --> &
< --> <
> --> >
" --> "
' --> '
/ --> /
Rule 2 for HTML Common Attributes
Except for alphanumeric characters, escape all characters with ASCII values less than 256 with the &#xHH; format (or a named entity if available) to prevent switching out of the attribute. The reason this rule is so broad is that developers frequently leave attributes unquoted. Properly quoted attributes can only be escaped with the corresponding quote. Unquoted attributes can be broken out of with many characters, including [space] % * + , - / ; < = > ^ and |.

regular expression to find words that are not links in FrontPage

FrontPage 2003 has an option to Find with regular expressions, and I need to look for words that are not links..
For example, to find the word "Download" or "any text download something" that doesn't have the "a" tag before and after
I mean something that doesn't have the tags
<a href> </a>
before and after