wordpress query with quote/apostrophe varrients

This is a bit of a two fold question. I’ve an ajax request that polles for duplicate post titles, but it’s getting thrown by different quote/apostrophes and their variants, returning negative when I know there is a duplicate.

I have a post titled: “Ben’s Big Fish” ie with the apostrophe (’)

Read More

But doing a query for the following always comes back negative:

Ben's Big Fish (')
Ben’s Big Fish (’)
Bens Big Fish (no apos)

However a query for Big Fish returns all the variant posts titles with these words in it, including post titles with quotes and apostrophes in them.

Here are the main characters that are also causing issue:

Apostrophe          '   '
Open single quote   ‘   ‘ 
Close single quote  ’   ’
--- 
Quotation mark      "   "
Open double quotes  “   “ 
Close double quotes ”   ”

As users are often pulling text from MS Word docs etc, these characters are coming up a lot.

On the js end I encode the post title by passing it through this function before sending it via json to my ajax handler:

function htmlEntities(str) {
    return String(str).replace(/&/g, '&amp;').replace(/</g, '&lt;').replace(/>/g, '&gt;').replace(/"/g, '&quot;').replace(/'/g, '&apos;').replace(/‘/g, '&lsquo;').replace(/’/g, '&rsquo;').replace(/“/g, '&ldquo;').replace(/”/g, '&rdquo;');
} 

In my php ajax hook I’m treating the incoming POST query as follows:

global $wpdb;
// Grab details from inbound POST array & prepare for sql
$title = html_entity_decode($_POST['post_title']); //first un-encode
$post_id = $_POST['post_id'];

$sim_query = "SELECT ID FROM $wpdb->posts WHERE post_status = 'publish' AND post_title LIKE '%%%s%%' AND ID != '%d'";
$sim_results = $wpdb->get_results( $wpdb->prepare( $sim_query, $wpdb->esc_like($title), $post_id ) );
if ($sim_results)
{ // Send the results back as json }

So my question is
a) how to get the query to return the obvious duplicates as expected
b) and possibly related, there a way we can efficiently search through strings that look for all variants the appearance of apostrophe and quotation characters without multiple queries?

Related posts

1 comment

  1. The crux of the problem actually fell back to the original encoding from JS. One of the key characters that was tripping us up: &apos;, is not actually decoded by html_entity_decode, even with the ENT_QUOTES flag set. Instead it expects &#039;.

    So in the end our js looks like:

    function htmlEntities(str) {
        return String(str).replace(/&/g, '&amp;').replace(/</g, '&lt;').replace(/>/g, '&gt;').replace(/"/g, '&quot;').replace(/'/g, '&#039;').replace(/‘/g, '&lsquo;').replace(/’/g, '&rsquo;').replace(/“/g, '&ldquo;').replace(/”/g, '&rdquo;');
    } 
    

    and we decode in PHP:

     $title = html_entity_decode($_POST['post_title'], ENT_QUOTES,  'UTF-8' ); //first un-encode
    

    Its also important to note that SQL, will balk at single quotes, and apostrophes. It requires that they be escaped by doubling them like so: ''. WordPress takes care of the escaping for us when we use its SQL escaping class $wpdb->prepare

Comments are closed.