Skip to content Skip to sidebar Skip to footer

What's A Simple Way To Efficiently Find Specific Terms Or Phrases Within A Short Unknown String?

Working on a twitterfeed visualization. I have a big dataset. I only want to use tweet messages that contain specific strings of words. I now have this line: data = data.filter(

Solution 1:

Place your search terms in their own array and then cycle through it when running the check.

var searchTerms = ['new year', 'christmas', 'boxing day'];

data = data.filter(function(d, i) { 
   var termFound = false;
   for (x in searchTerms) {
      if (d.text.indexOf(searchTerms[x]) != -1) {
          termFound = true;
      }
   }
   return termFound;
   })

Solution 2:

This is a pretty classic string-search / string-matching problem. First, some terminology: String matching algorithms usually refer to the search space as the 'text' - in this case, your tweet or tweets; and the 'pattern(s)' - your search terms.

The complexity of most string-matching algorithms is measured in terms of the length of the text, the length of the pattern(s), and the number of matches.

The naive approach is of course nested loops and linear search. Pseudocode:

foreach text (tweet)
    foreach pattern (search term)
        linear search the text for the pattern

That's O(t * p), where t is the total length of all texts and p is the total length of all patterns. You can probably improve considerably on this, especially if either the text or the patterns are fixed over multiple runs, allowing you to do some pre-processing for efficient search. Take a look at Wikipedia's description of string search algorithms for a few possibilities.

Your choice of a specific algorithm will probably depend on your memory constraints and the trade-off between pre-processing time and runtime complexity. But I'll throw out a couple things to look at. It sounds like your patterns are probably fixed, and that your text may vary (searching different twitter feeds?), so you might want to look at the Aho-Corasick algorithm. You might find a suffix tree a useful data structure as well. The links from those Wikipedia pages, and a Google search for those terms should help you get started (you might even find implemented code to help, although I don't do JavaScript, so I wouldn't know what to recommend there).

Post a Comment for "What's A Simple Way To Efficiently Find Specific Terms Or Phrases Within A Short Unknown String?"