Get All Page Titles On Wikipedia That Contain A Specific Word
I am writing an 'auto-wikifier' tool using HTML and JavaScript. For each word in the text to be wikified, I need to obtain a list of pages that contain that word (so that the match
Solution 1:
First, I'mnotsureIunderstandhowwouldsomethinglikethatbeuseful. (Wikipedia has articles for all the common words and I don't think links to them would be of any use.)
But if you really wanted to do something like this, I think a much better way would be to use the API to find out which words from your input text have articles.
For example, for the string I am writing an "auto-wikifier" tool
, your query could look something like:
http://en.wikipedia.org/w/api.php?format=xml&action=query&titles=I|am|writing|an|auto-wikifier|tool
And the answer is:
<api><query><normalized><nfrom="am"to="Am" /><nfrom="writing"to="Writing" /><nfrom="an"to="An" /><nfrom="auto-wikifier"to="Auto-wikifier" /><nfrom="tool"to="Tool" /></normalized><pages><pagens="0"title="Auto-wikifier"missing="" /><pagepageid="2513432"ns="0"title="Am" /><pagepageid="2513422"ns="0"title="An" /><pagepageid="25346998"ns="0"title="I" /><pagepageid="30677"ns="0"title="Tool" /><pagepageid="32977"ns="0"title="Writing" /></pages></query></api>
Few notes:
- The results are not in the order you specified them.
- If a page doesn't exist, the result has
missing=""
attribute. - JSON and JSONP formats are available too, that might be more suitable for JavaScript.
- The
titles
parameter has a limit of 50 per one query.
Solution 2:
The API:Allpages is an interesting start. Sadly, it is limited to 500 queries
Post a Comment for "Get All Page Titles On Wikipedia That Contain A Specific Word"