Optimizing a Proximity-based Store Location Search on a Shared Web Host?

I’ve got a project where I need to build a store locator for a client.

I’m using a custom post type “restaurant-location” and I have written the code to geocode the addresses stored in postmeta using the Google Geocoding API (heres the link that geocodes the US White House in JSON and I’ve stored the latitude and longitude back to custom fields.

Read More

I’ve written a get_posts_by_geo_distance() function that returns a list of posts in order of those which are closest geographically using the formula I found in the slideshow at this post. You might call my function like so (I’m starting with a fixed “source” lat/long):

include "wp-load.php";

$source_lat = 30.3935337;
$source_long = -86.4957833;

$results = get_posts_by_geo_distance(
    'restaurant-location',
    'geo_latitude',
    'geo_longitude',
    $source_lat,
    $source_long);

echo '<ul>';
foreach($results as $post) {
    $edit_url = get_edit_url($post->ID);
    echo "<li>{$post->distance}: <a href="{$edit_url}" target="_blank">{$post->location}</a></li>";
}
echo '</ul>';
return;

Here’s the function get_posts_by_geo_distance() itself:

function get_posts_by_geo_distance($post_type,$lat_key,$lng_key,$source_lat,$source_lng) {
    global $wpdb;
    $sql =<<<SQL
SELECT
    rl.ID,
    rl.post_title AS location,
    ROUND(3956*2*ASIN(SQRT(POWER(SIN(({$source_lat}-abs(lat.lat))*pi()/180/2),2)+
    COS({$source_lat}*pi()/180)*COS(abs(lat.lat)*pi()/180)*
    POWER(SIN(({$source_lng}-lng.lng)*pi()/180/2),2))),3) AS distance
FROM
    wp_posts rl
    INNER JOIN (SELECT post_id,CAST(meta_value AS DECIMAL(11,7)) AS lat FROM wp_postmeta lat WHERE lat.meta_key='{$lat_key}') lat ON lat.post_id = rl.ID
    INNER JOIN (SELECT post_id,CAST(meta_value AS DECIMAL(11,7)) AS lng FROM wp_postmeta lng WHERE lng.meta_key='{$lng_key}') lng ON lng.post_id = rl.ID
WHERE
    rl.post_type='{$post_type}' AND rl.post_name<>'auto-draft'
ORDER BY
    distance
SQL;
    $sql = $wpdb->prepare($sql,$source_lat,$source_lat,$source_lng);
    return $wpdb->get_results($sql);
}

My concern is that the SQL is about as non-optimized as you can get. MySQL can’t order by any available index since the source geo is changeable and there’s not a finite set of source geos to cache. Currently I’m stumped as to ways to optimize it.

Taking into consideration what I’ve done already the question is: How would you go about optimizing this use-case?

It’s not important that I keep anything I’ve done if a better solution would have me throw it out. I’m open to considering almost any solution except for one that requires doing something like installing a Sphinx server or anything that requires a customized MySQL configuration. Basically the solution needs to be able to work on any plain vanilla WordPress install. (That said, it would be great if anyone wants to list alternate solutions for others who might be able to get more advanced and for posterity.)

Resources Found

FYI, I did a bit of research on this so rather than have you do the research again or rather than have you post any of these links as an answer I’ll go ahead and include them.

Regarding Sphinx Search

Related posts

Leave a Reply

4 comments

  1. What precision do you need? if it’s a state/national wide search maybe you could do a lat-lon to zip lookup and have precomputed distance from zip area to zip area of the restaurant. If you need accurate distances that won’t be a good option.

    You should look into a Geohash solution, in the Wikipedia article there is a link to a PHP library to encode decode lat long to geohashs.

    Here you have a good article explaining why and how they use it in Google App Engine (Python code but easy to follow.) Because of the need to use geohash in GAE you can find some good python libraries and examples.

    As this blog post explains, the advantage of using geohashes is that you can create an index on the MySQL table on that field.

  2. This might be too late for you, but I’m going to reply anyway, with a similar answer as I gave to this related question, so future visitors can refer to both questions.

    I would not store these values in the post metadata table, or at least not only there. You want a table with post_id, lat, lon columns, so you can place an index of lat, lon and query on that. This should not be too hard to keep up to date with a hook on post save and update.

    When you query the database, you define a bounding box around the starting point, so you can do an efficient query for all lat, lon pairs between the North-South and East-West borders of the box.

    After you get this reduced result, you can do a more advanced (circular or actual driving directions) distance calculation to filter out the locations that are in the corners of the bounding box and therefor further away than you desire.

    Here you find a simple code example that works in the admin area. You need to create the extra database table yourself. The code is ordered from most to least interesting.

    <?php
    /*
    Plugin Name: Monkeyman geo test
    Plugin URI: http://www.monkeyman.be
    Description: Geolocation test
    Version: 1.0
    Author: Jan Fabry
    */
    
    class Monkeyman_Geo
    {
        public function __construct()
        {
            add_action('init', array(&$this, 'registerPostType'));
            add_action('save_post', array(&$this, 'saveLatLon'), 10, 2);
    
            add_action('admin_menu', array(&$this, 'addAdminPages'));
        }
    
        /**
         * On post save, save the metadata in our special table
         * (post_id INT, lat DECIMAL(10,5), lon DECIMAL (10,5))
         * Index on lat, lon
         */
        public function saveLatLon($post_id, $post)
        {
            if ($post->post_type != 'monkeyman_geo') {
                return;
            }
            $lat = floatval(get_post_meta($post_id, 'lat', true));
            $lon = floatval(get_post_meta($post_id, 'lon', true));
    
            global $wpdb;
            $result = $wpdb->replace(
                $wpdb->prefix . 'monkeyman_geo',
                array(
                    'post_id' => $post_id,
                    'lat' => $lat,
                    'lon' => $lon,
                ),
                array('%s', '%F', '%F')
            );
        }
    
        public function addAdminPages()
        {
            add_management_page( 'Quick location generator', 'Quick generator', 'edit_posts', __FILE__  . 'generator', array($this, 'doGeneratorPage'));
            add_management_page( 'Location test', 'Location test', 'edit_posts', __FILE__ . 'test', array($this, 'doTestPage'));
    
        }
    
        /**
         * Simple test page with a location and a distance
         */
        public function doTestPage()
        {
            if (!array_key_exists('search', $_REQUEST)) {
                $default_lat = ini_get('date.default_latitude');
                $default_lon = ini_get('date.default_longitude');
    
                echo <<<EOF
    <form action="" method="post">
        <p>Center latitude: <input size="10" name="center_lat" value="{$default_lat}"/>
            <br/>Center longitude: <input size="10" name="center_lon" value="{$default_lon}"/>
            <br/>Max distance (km): <input size="5" name="max_distance" value="100"/></p>
        <p><input type="submit" name="search" value="Search!"/></p>
    </form>
    EOF;
                return;
            }
            $center_lon = floatval($_REQUEST['center_lon']);
            $center_lat = floatval($_REQUEST['center_lat']);
            $max_distance = floatval($_REQUEST['max_distance']);
    
            var_dump(self::getPostsUntilDistanceKm($center_lon, $center_lat, $max_distance));
        }
    
        /**
         * Get all posts that are closer than the given distance to the given location
         */
        public static function getPostsUntilDistanceKm($center_lon, $center_lat, $max_distance)
        {
            list($north_lat, $east_lon, $south_lat, $west_lon) = self::getBoundingBox($center_lat, $center_lon, $max_distance);
    
            $geo_posts = self::getPostsInBoundingBox($north_lat, $east_lon, $south_lat, $west_lon);
    
            $close_posts = array();
            foreach ($geo_posts as $geo_post) {
                $post_lat = floatval($geo_post->lat);
                $post_lon = floatval($geo_post->lon);
                $post_distance = self::calculateDistanceKm($center_lat, $center_lon, $post_lat, $post_lon);
                if ($post_distance < $max_distance) {
                    $close_posts[$geo_post->post_id] = $post_distance;
                }
            }
            return $close_posts;
        }
    
        /**
         * Select all posts ids in a given bounding box
         */
        public static function getPostsInBoundingBox($north_lat, $east_lon, $south_lat, $west_lon)
        {
            global $wpdb;
            $sql = $wpdb->prepare('SELECT post_id, lat, lon FROM ' . $wpdb->prefix . 'monkeyman_geo WHERE lat < %F AND lat > %F AND lon < %F AND lon > %F', array($north_lat, $south_lat, $west_lon, $east_lon));
            return $wpdb->get_results($sql, OBJECT_K);
        }
    
        /* Geographical calculations: distance and bounding box */
    
        /**
         * Calculate the distance between two coordinates
         * http://stackoverflow.com/questions/365826/calculate-distance-between-2-gps-coordinates/1416950#1416950
         */
        public static function calculateDistanceKm($a_lat, $a_lon, $b_lat, $b_lon)
        {
            $d_lon = deg2rad($b_lon - $a_lon);
            $d_lat = deg2rad($b_lat - $a_lat);
            $a = pow(sin($d_lat/2.0), 2) + cos(deg2rad($a_lat)) * cos(deg2rad($b_lat)) * pow(sin($d_lon/2.0), 2);
            $c = 2 * atan2(sqrt($a), sqrt(1-$a));
            $d = 6367 * $c;
    
            return $d;
        }
    
        /**
         * Create a box around a given point that extends a certain distance in each direction
         * http://www.colorado.edu/geography/gcraft/warmup/aquifer/html/distance.html
         *
         * @todo: Mind the gap at 180 degrees!
         */
        public static function getBoundingBox($center_lat, $center_lon, $distance_km)
        {
            $one_lat_deg_in_km = 111.321543; // Fixed
            $one_lon_deg_in_km = cos(deg2rad($center_lat)) * 111.321543; // Depends on latitude
    
            $north_lat = $center_lat + ($distance_km / $one_lat_deg_in_km);
            $south_lat = $center_lat - ($distance_km / $one_lat_deg_in_km);
    
            $east_lon = $center_lon - ($distance_km / $one_lon_deg_in_km);
            $west_lon = $center_lon + ($distance_km / $one_lon_deg_in_km);
    
            return array($north_lat, $east_lon, $south_lat, $west_lon);
        }
    
        /* Below this it's not interesting anymore */
    
        /**
         * Generate some test data
         */
        public function doGeneratorPage()
        {
            if (!array_key_exists('generate', $_REQUEST)) {
                $default_lat = ini_get('date.default_latitude');
                $default_lon = ini_get('date.default_longitude');
    
                echo <<<EOF
    <form action="" method="post">
        <p>Number of posts: <input size="5" name="post_count" value="10"/></p>
        <p>Center latitude: <input size="10" name="center_lat" value="{$default_lat}"/>
            <br/>Center longitude: <input size="10" name="center_lon" value="{$default_lon}"/>
            <br/>Max distance (km): <input size="5" name="max_distance" value="100"/></p>
        <p><input type="submit" name="generate" value="Generate!"/></p>
    </form>
    EOF;
                return;
            }
            $post_count = intval($_REQUEST['post_count']);
            $center_lon = floatval($_REQUEST['center_lon']);
            $center_lat = floatval($_REQUEST['center_lat']);
            $max_distance = floatval($_REQUEST['max_distance']);
    
            list($north_lat, $east_lon, $south_lat, $west_lon) = self::getBoundingBox($center_lat, $center_lon, $max_distance);
    
    
            add_action('save_post', array(&$this, 'setPostLatLon'), 5);
            $precision = 100000;
            for ($p = 0; $p < $post_count; $p++) {
                self::$currentRandomLat = mt_rand($south_lat * $precision, $north_lat * $precision) / $precision;
                self::$currentRandomLon = mt_rand($west_lon * $precision, $east_lon * $precision) / $precision;
    
                $location = sprintf('(%F, %F)', self::$currentRandomLat, self::$currentRandomLon);
    
                $post_data = array(
                    'post_status' => 'publish',
                    'post_type' => 'monkeyman_geo',
                    'post_content' => 'Point at ' . $location,
                    'post_title' => 'Point at ' . $location,
                );
    
                var_dump(wp_insert_post($post_data));
            }
        }
    
        public static $currentRandomLat = null;
        public static $currentRandomLon = null;
    
        /**
         * Because I didn't know how to save meta data with wp_insert_post,
         * I do it here
         */
        public function setPostLatLon($post_id)
        {
            add_post_meta($post_id, 'lat', self::$currentRandomLat);
            add_post_meta($post_id, 'lon', self::$currentRandomLon);
        }
    
        /**
         * Register a simple post type for us
         */
        public function registerPostType()
        {
            register_post_type(
                'monkeyman_geo',
                array(
                    'label' => 'Geo Location',
                    'labels' => array(
                        'name' => 'Geo Locations',
                        'singular_name' => 'Geo Location',
                        'add_new' => 'Add new',
                        'add_new_item' => 'Add new location',
                        'edit_item' => 'Edit location',
                        'new_item' => 'New location',
                        'view_item' => 'View location',
                        'search_items' => 'Search locations',
                        'not_found' => 'No locations found',
                        'not_found_in_trash' => 'No locations found in trash',
                        'parent_item_colon' => null,
                    ),
                    'description' => 'Geographical locations',
                    'public' => true,
                    'exclude_from_search' => false,
                    'publicly_queryable' => true,
                    'show_ui' => true,
                    'menu_position' => null,
                    'menu_icon' => null,
                    'capability_type' => 'post',
                    'capabilities' => array(),
                    'hierarchical' => false,
                    'supports' => array(
                        'title',
                        'editor',
                        'custom-fields',
                    ),
                    'register_meta_box_cb' => null,
                    'taxonomies' => array(),
                    'permalink_epmask' => EP_PERMALINK,
                    'rewrite' => array(
                        'slug' => 'locations',
                    ),
                    'query_var' => true,
                    'can_export' => true,
                    'show_in_nav_menus' => true,
                )
            );
        }
    }
    
    $monkeyman_Geo_instance = new Monkeyman_Geo();
    
  3. I’m late to the party on this one, but looking back at this, the get_post_meta is really the problem here, rather than the SQL query you’re using.

    I recently had to do a similar geo lookup on a site that I run, and rather than using the meta table to store lat and lon (which requires at best two joins to look up and, if you’re using get_post_meta, two additional database queries per location), I created a new table with a spatially indexed geometry POINT data type.

    My query looked a lot like yours, with MySQL doing a lot of the heavy lifting (I left out the trig functions and simplified everything to two-dimensional space, because it was close enough for my purposes):

    function nearby_property_listings( $number = 5 ) {
        global $client_location, $wpdb;
    
        //sanitize public inputs
        $lat = (float)$client_location['lat'];  
        $lon = (float)$client_location['lon']; 
    
        $sql = $wpdb->prepare( "SELECT *, ROUND( SQRT( ( ( ( Y(geolocation) - $lat) * 
                                                           ( Y(geolocation) - $lat) ) *
                                                             69.1 * 69.1) +
                                                      ( ( X(geolocation) - $lon ) * 
                                                           ( X(geolocation) - $lon ) * 
                                                             53 * 53 ) ) ) as distance
                                FROM {$wpdb->properties}
                                ORDER BY distance LIMIT %d", $number );
    
        return $wpdb->get_results( $sql );
    }
    

    where $client_location is a value returned by a public geo IP lookup service (I used geoio.com, but there are a number of similar ones.)

    It may seem unwieldy, but in testing it, it consistently returned the nearest 5 locations out of a 80,000-row table in under .4 sec.

    Until MySQL rolls out the DISTANCE function that’s being proposed, this seems like the best way I found to implement location lookups.

    EDIT: Adding the table structure for this particular table. Its a set of property listings, so it may or may not be similar to any other use-case.

    CREATE TABLE IF NOT EXISTS `rh_properties` (
      `listingId` int(10) unsigned NOT NULL,
      `listingType` varchar(60) collate utf8_unicode_ci NOT NULL,
      `propertyType` varchar(60) collate utf8_unicode_ci NOT NULL,
      `status` varchar(20) collate utf8_unicode_ci NOT NULL,
      `street` varchar(64) collate utf8_unicode_ci NOT NULL,
      `city` varchar(24) collate utf8_unicode_ci NOT NULL,
      `state` varchar(5) collate utf8_unicode_ci NOT NULL,
      `zip` decimal(5,0) unsigned zerofill NOT NULL,
      `geolocation` point NOT NULL,
      `county` varchar(64) collate utf8_unicode_ci NOT NULL,
      `bedrooms` decimal(3,2) unsigned NOT NULL,
      `bathrooms` decimal(3,2) unsigned NOT NULL,
      `price` mediumint(8) unsigned NOT NULL,
      `image_url` varchar(255) collate utf8_unicode_ci NOT NULL,
      `description` mediumtext collate utf8_unicode_ci NOT NULL,
      `link` varchar(255) collate utf8_unicode_ci NOT NULL,
      PRIMARY KEY  (`listingId`),
      KEY `geolocation` (`geolocation`(25))
    )
    

    The geolocation column is the only thing relevant for the purposes here; it consists of x(lon),y(lat) coordinates that I just look up from the address upon importing new values into the database.