RegExp trouble with a LONG pattern

Usually the string is long and the pattern is short when doing a Reqular Expression code, but this time is the other way around. I have a short text of about 500 characters. In that text I want to find names that match a database of about 47.000 unique names and add a link to the specific name. What is the best way of doing this? I partition the array of names into 64 partitions since the one array is to big to procces as a pattern.

function implode_r ($glue, $pieces){
    $out = "";
    foreach ($pieces as $piece){
        if (is_array ($piece)){
            $out .= implode_r ($glue, $piece); // recurse
        }
        else{
            if(strlen($piece)>1){
                $piece = str_replace("(", "(", $piece);
                $piece = str_replace(")", ")", $piece);
                $piece = str_replace("[", "[", $piece);
                $piece = str_replace("]", "]", $piece);
                $piece = str_replace(":", ":", $piece);
                $piece = str_replace(".", ".", $piece);
                $piece = str_replace(",", ",", $piece);
                $piece = str_replace("'", "'", $piece);
                $piece = str_replace("&", "&", $piece);
                $piece = str_replace("?", "?", $piece);
                $piece = str_replace("!", "!", $piece);
                $piece = str_replace("<", "<", $piece);
                $piece = str_replace(">", ">", $piece);
                $piece = str_replace("{", "{", $piece);
                $piece = str_replace("}", "}", $piece);
                $out .= $glue.$piece;
            }
        }
    }
    return $out;
}

function partition( $list, $p ) {
    $listlen = count( $list );
    $partlen = floor( $listlen / $p );
    $partrem = $listlen % $p;
    $partition = array();
    $mark = 0;
    for ($px = 0; $px < $p; $px++) {
        $incr = ($px < $partrem) ? $partlen + 1 : $partlen;
        $partition[$px] = array_slice( $list, $mark, $incr );
        $mark += $incr;
    }
    return $partition;
}

add_filter( 'the_content', 'find_names_in_text');
add_filter( 'get_the_content', 'find_names_in_text');
function find_names_in_text($content){
    global $wpdb;
    $thenames = $wpdb->get_results("SELECT post_title FROM $wpdb->posts WHERE post_type='dogs' GROUP BY post_title", ARRAY_N);
    $namesparts = partition($thenames, 64);
    foreach($namesparts as $part){
        $pattern = implode_r("|", $part);
        $content = preg_replace("(".$pattern.")", "<a href='$1'>$1</a>", $content);
    }
    return $content;
}

Related posts

Leave a Reply

1 comment

  1. If your text is only 500 characters, i’d go about it the other way around. Divide the text in parts that could be a name (assuming those are words, no split-word names I presume).

    So now you have < 500 words you want to match in your database, so worst case you have to check your database for those. You won’t get 500 words ofcourse out of 500 characters, so you’d get a manageable query from there.