Following is a C++ std document. The document number is N3721, which superseded the older N3634.
Obviously, it's easy to track older documents of given topic.
However, my question is:
How to track newer documents of given topic?
For example, if N3721 is superseded by a newer document, how to track the newer one?
For the newer proposals (ones that start with the letter P) you can use wg21.link redirect service to obtain the latest document:
wg21.link - WG21 redirect service.
Usage:
wg21.link/nXXXX
wg21.link/pXXXXrX
Get paper.
wg21.link/pXXXX
Get latest public revision of paper.
wg21.link/std
wg21.link/std{11,14,17}
Get working draft.
wg21.link/cwgXXX
wg21.link/ewgXXX
wg21.link/lwgXXX
wg21.link/lewgXXX
wg21.link/fsXXX
wg21.link/editXXX
Get issue.
wg21.link/pXXXX/issue
Get issue for paper.
wg21.link/*wgXXX/paper
Get paper for issue.
wg21.link/index.json
wg21.link/index.ndjson
wg21.link/index.txt
wg21.link/specref.json
Get everything.
wg21.link/
Get usage.
wg21.link/<something else>
Get 404.
If you're Slackbot or Twitterbot:
Get OpenGraph metadata instead.
For example for P0476: Bit-casting object representations if we use wg21.link/P0476 we obtain the latest version which is P0476R2.
In my answer to How does the standards committee indicate the status of a paper under consideration? I go into more details of the WG21 site and what documents you can find there.
Use the everything link for Pre P proposals
If we use the wg21 redirect service Get Everything link we can do a text search for the paper title. So for your example Improvements to std::future<T> and Related APIs we can see the last document is N3857:
"N3857": {
"type": "paper",
"title": "Improvements to std::future and Related APIs",
"subgroup": "Concurrency",
"author": "N. Gustafsson, A. Laksberg, H. Sutter, S. Mithani",
"long_link": "http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3857.pdf",
"link": "https://wg21.link/n3857",
"source": "http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/",
"date": "2014-01-16"
},
Related
First time manage document using dls:document-insert-and-manage
Update the same document using xdmp:document-insert
Document get lost from the dls latest version collection cts:search(/scopedIntervention/id , dls:documents-query())
First time manage document
<scopedIntervention>
<id>someId12345</id>
<scopedInterventionName>
First Name
</scopedInterventionName>
<forTestOnly>
true
</forTestOnly>
<inactive>
true
</inactive>
</scopedIntervention>)```
**Document inserted with versioning**
Verify document is present in latest documents collection
cts:search(/scopedIntervention/id , dls:documents-query())
Document present in managed latest collection
Update the same document
<scopedIntervention>
<id>someId12345</id>
<scopedInterventionName>
Updated Name
</scopedInterventionName>
<forTestOnly>
true
</forTestOnly>
<inactive>
true
</inactive>
</scopedIntervention>)```
**Update document to same URI using xdmp:document-insert**
Again verify document is present or NOT in latest documents collection
cts:search(/scopedIntervention/id , dls:documents-query())
Document NOT present in managed latest collection (lost from collection)
After applying DLS package using following upgrade step, the same document shows in the list
```xquery version "1.0-ml";
import module namespace dls = "http://marklogic.com/xdmp/dls"
at "/MarkLogic/dls.xqy";
dls:set-upgrade-status(fn:false()),
dls:start-upgrade(),
fn:doc("http://marklogic.com/dls/upgrade-task-status.xml"),
dls:latest-validation-results(),
dls:set-upgrade-status(fn:true())```
Update the same document using xdmp:document-insert
You are most likely removing the DLS Latest collection at this step. Further, version history is not preserved when you do this.
Instead of using xdmp:document-insert you should use dls:document-checkout-update-checkin .
Please read to the end -- if you did NOT do a DLS Upgrade on an upgraded ML version - STOP NOW and follow the upgrade instructions. Not doing so will leave DLS in a unstable state and anything else you do will make things much harder to repair.
+1 Rob. #IAM, reguardless of if it 'worked' or appeared to 'work' in V7, dls was not designed to handle the case you describe. DLS architecture depends on encapsulating all changes to documents within the checkin/checkout semantics. Bypassing that, you might as well bypass DLS entirely because it wont work. The fact that it was 'working' in V7 is a misnomer, it may have not behaved incorrectly in ways that your application cared about, or your code may have coincidentally done sufficiently similar work as the internals. You might get lucky and find a way to do so again, but I encourage you to consider how to work within the define behaviour of the library, or to refactor out those parts of your code that are not 'DLS Friendly' to operate between checkout/checkin windows -- not all updates have to be the checkout-update-checkin -- you can checkout -- do whatever -- then checkin.
As a migration workaround you MIGHT be able to make use of the upgrade functions added to dls on an ongoing basis.
See https://docs.marklogic.com/dls:start-upgrade
In V9 (I believe), significant non-backwards compatible changes were made to DLS internals that require running this code. one time
The assumption was as in-place update from prior DLS to current. However the code may also happen to work on an ongoing basis, depending on the details of exactly what your application code is doing that the DLS code doesn't know about.
The 'new' DLS code adds an internal collection to optimize the common case of searching for 'latest' documents -- if that is dropped then those documents will not show up on DLS searches (for 'latest').
You mention your code is 'migration scripts' --> If these are migrating from V7 to V10 then you could run your code before the V10 update, then run the V10 update then run the dls-upgrade. After that the documents should be in good shape -- as long as you don't do anything else that is not defined behaviour for managed documents.
I am told that the following list of "puppy" image URL's are from imagenet.
https://github.com/asharov/cute-animal-detector/blob/master/data/puppy-urls.txt
How do I download another category for e.g. "cats"?
Where can I get the entire list of imagenet categories along with their explanation in csv?
Unfortunately, ImageNet is no longer as easily accessible as it previously was. You now have to create a free account, and then request access to the database using an email address that demonstrates your status as a non-commercial researcher. Following is an excerpt of the announcement posted on March 11, 2021 (does not specifically address the requirements to obtain an account and request access permission but explains some of their reasons for changing the website generally).
We are proud to see ImageNet's wide adoption going beyond what was originally envisioned. However, the decade-old website was burdened by growing download requests. To serve the community better, we have redesigned the website and upgraded its hardware. The new website is simpler; we removed tangential or outdated functions to focus on the core use case—enabling users to download the data, including the full ImageNet dataset and the ImageNet Large Scale Visual Recognition Challenge (ILSVRC).
ORIGINAL ANSWER (LINKS NO LONGER VALID):
You can interactively explore available synsets (categories) at http://www.image-net.org/explore, each synset page has a "Downloads" tab where you can download category image URLs.
Alternatively, you can use the ImageNet API. You can download image URLs for a particular synset using the synset id or wnid. The image URL download link below uses the wnid n02121808 for domestic cat, house cat, Felis domesticus, Felis catus.
http://www.image-net.org/api/text/imagenet.synset.geturls?wnid=n02121808
You can find the wnid for a particular synset using the explore link above (the id for a selected synset will be displayed in the browser address bar).
You can retrieve a list of all available synsets (by id) from:
http://www.image-net.org/api/text/imagenet.synset.obtain_synset_list.
You can retrieve the words associated with any synset id as follows (another cat example).
http://www.image-net.org/api/text/wordnet.synset.getwords?wnid=n02121808
or you can download smaller size of imagenet, mini-imagenet:
https://github.com/yaoyao-liu/mini-imagenet-tools
2-1. https://github.com/dragen1860/LearningToCompare-Pytorch/issues/4
2-2. https://github.com/twitter/meta-learning-lstm/tree/master/data/miniImagenet
You can easily use the python package MLclf to download and transform the mini-imagenet data for the traditional image classification task or the meta-learning task. just use:
pip install MLclf
You can also see for more details:
https://pypi.org/project/MLclf/
Here is the scenario. I have an XML document which contains tags. I want to create a transform that does this
<tag>content A</tag> 1. content A
<tag>content B</tag> ----> 2. content B
<tag>content C</tag> 3. content C
but only if the tag contents appear on the same physical page. The numbering should restart on each new page. Is there any way to do this using XSL-FO? I know with latex the only way to accomplish something like this is to run latex twice, with the interim document used to determine content page placement.
As far as I can tell (and as confirmed by the Antenna House tech support team), there is no way to do this using standard XSL-FO. Antenna House offers <axf:footnote*/> extensions which include the ability to set an axf:footnote-number-reset="page" attribute, and as suggested in the comments, RenderX offers a generic mechanism which might be used for this purpose, but both of these involve vendor-specific extensions to the language.
This points to a number of shortcomings in XSL-FO that really should have been addressed a long time ago with a 2.0 version of the specification. A w3c committee to develop an XSL-FO 2.0 spec was formed and then disbanded quite some time ago; I have no idea why, as I find the tool indispensable for a large class of document to PDF conversions.
I am writing a webserver in C++. I am looking at the POST documentation on w3:
http://www.w3.org/TR/html401/interact/forms.html#h-17.13.4
I see that a POST is supposed to support the full multi-parts scheme: parts and sub-parts (and obviously, sub-sub-parts...) just like for email attachments.
Is there any browser and/or tool that do that on a normal basis? In other words, is it really important for a server to support parts and sub-parts?
The obvious problem with that is the fact that it could mean that two files are uploaded under the same name. That's quite a problem if you ask me. Also, from what I can see in PHP it is not supported at all in that realm. Am I correct?
Ah! I guess I should have searched a little more and to tell you the truth I had not thought of looking at HTML5 for the answer.
The following paragraph actually includes the answer:
http://www.w3.org/html/wg/drafts/html/master/forms.html#multipart-form-data
Note: In particular, this means that multiple files submitted as
part of a single element will
result in each file having its own field; the "sets of
files" feature ("multipart/mixed") of RFC 2388 is not used.
So it is clear that sub-parts (multipart/mixed) are not to be supported.
Does anyone know any more details about google's web-crawler (aka GoogleBot)? I was curious about what it was written in (I've made a few crawlers myself and am about to make another) and if it parses images and such. I'm assuming it does somewhere along the line, b/c the images in images.google.com are all resized. It also wouldn't surprise me if it was all written in Python and if they used all their own libraries for most everything, including html/image/pdf parsing. Maybe they don't though. Maybe it's all written in C/C++. Thanks in advance-
you can find a bit about how googlebot works here:
http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=158587
for example the "fetch as googlebot" tool lets you see a page as Googlebot sees it.
The crawler is very likely written in C or C++, at least backrub's crawler was written in one of these.
Be aware that the crawler only takes a snapshot of the page, then stores it in a temporary database for later processing. The indexing and other attached algorithms will extract the data, for example the image references.
Officially allowed languages at Google, I think, are Python/C++/Java.
The bot likely uses all 3 for different tasks.