Boolean Search Operator Optimization, respecting quoted terms

Looking for the most optimal way to parse a complex search string and reduce it to have 1 Boolean operator between terms. The operators are OR, AND, NOR.

Ex:
1. OR AND NOR NOR AND AND Fred OR NOR AND Wilma NOR AND AND Barney OR “The and AND Flintstones” OR AND NOR NOR AND AND

Read More

OR AND NOR nor AND AND Fred OR NOR AND Wilma NOR and AND Barney OR “The and AND Flintstones” OR and NOR NOR AND AND (added Mar-05-2014)

Result:

Fred OR Wilma NOR Barney OR “The and AND Flintstones”

Note: I am looking for PHP code implementation.

Related posts

Leave a Reply

2 comments

  1. Your best bet would be to read the input word-by-word and process it in a state machine.

    Something along the lines:

    define("STATE_DEFAULT", 0); // we're in regular text
    define("STATE_OPERATOR", 1);  // we found operator (AND|OR|NOR)
    define("STATE_QUOTE",2); // we're inside quoted text
    
    $input = 'OR AND NOR NOR AND AND Fred OR NOR AND Wilma NOR AND AND Barney OR "The and AND Flintstones" OR AND NOR NOR AND AND';
    
    // check if a word is an operator... used in multiple places 
    function _is_op($word) { return preg_match("/^(AND|OR|NOR)$/i", $word); }
    
    $words = explode(" ", $input);
    $words_count = count($words);
    $state = STATE_DEFAULT;
    
    for($i=0; $i<$words_count; ++$i)
    {
        $word = $words[$i];
    
        switch($state)
        {
             case STATE_QUOTE:
                 if(substr($word,-1)=='"') $state = STATE_DEFAULT;
                 break;
    
             case STATE_OPERATOR:
                 if(_is_op($word))
                 {
                     unset($words[$i]);
                     break;
                 }
    
             case STATE_DEFAULT:
             default:
                $state = STATE_DEFAULT;
                 if($word[0] == '"')
                     $state = STATE_QUOTE;
                 elseif(_is_op($word))
                     $state = STATE_OPERATOR;
                 break;
        }
    }
    
    // if we removed some words, count()-1 is no longer the last element
    $words = array_values($words);
    
    // strip operators from start and end
    if(_is_op($words[0])) array_shift($words);
    if(_is_op($words[count($words)-1])) array_pop($words);
    
    $output = implode(" ", $words);
    

    While it might be possible to do this using regular expressions, it’d be complex and quite unmanageable .

  2. Hope this will help:
    
    $words = explode (" ", 'OR AND NOR NOR AND AND Fred OR NOR AND Wilma NOR AND AND Barney OR "The and AND Flintstones" OR AND NOR NOR AND AND');
    
    $arrayOperators = array('OR', 'AND', 'NOR');
    
    $flag = false;
    foreach($words as $key => $word){
        if($flag && in_array($word, $arrayOperators)){
              unset($words[$key]);
        }elseif(in_array($word, $arrayOperators)){
           if($key != 0){
              $resultArray[] = $word;
            }
            $flag = true;
    
        }else{
            $resultArray[] = $word;
            $flag = false;
        }
    }
    end($resultArray);
    $lastKey = key($resultArray);
    if(in_array($resultArray[$lastKey], $arrayOperators)){
       array_pop($resultArray);
    }
    
    var_dump(implode(' ', $resultArray));
    

    Result:

    string(53) "Fred OR Wilma NOR Barney OR "The and AND Flintstones""