Skip to content Skip to sidebar Skip to footer

Get All Page Titles On Wikipedia That Contain A Specific Word

I am writing an 'auto-wikifier' tool using HTML and JavaScript. For each word in the text to be wikified, I need to obtain a list of pages that contain that word (so that the match

Solution 1:

First, I'mnotsureIunderstandhowwouldsomethinglikethatbeuseful. (Wikipedia has articles for all the common words and I don't think links to them would be of any use.)

But if you really wanted to do something like this, I think a much better way would be to use the API to find out which words from your input text have articles.

For example, for the string I am writing an "auto-wikifier" tool, your query could look something like:

http://en.wikipedia.org/w/api.php?format=xml&action=query&titles=I|am|writing|an|auto-wikifier|tool

And the answer is:

<api><query><normalized><nfrom="am"to="Am" /><nfrom="writing"to="Writing" /><nfrom="an"to="An" /><nfrom="auto-wikifier"to="Auto-wikifier" /><nfrom="tool"to="Tool" /></normalized><pages><pagens="0"title="Auto-wikifier"missing="" /><pagepageid="2513432"ns="0"title="Am" /><pagepageid="2513422"ns="0"title="An" /><pagepageid="25346998"ns="0"title="I" /><pagepageid="30677"ns="0"title="Tool" /><pagepageid="32977"ns="0"title="Writing" /></pages></query></api>

Few notes:

  • The results are not in the order you specified them.
  • If a page doesn't exist, the result has missing="" attribute.
  • JSON and JSONP formats are available too, that might be more suitable for JavaScript.
  • The titles parameter has a limit of 50 per one query.

Solution 2:

The API:Allpages is an interesting start. Sadly, it is limited to 500 queries

Post a Comment for "Get All Page Titles On Wikipedia That Contain A Specific Word"