Missing results due to geo proximity formula (store locator)

OK – I’ve been wrestling with this for about 3 months on and off and since I’ve exhausted every geo proximity formula out there that I’ve come across and I’m no closer to getting the right results I figured it time to ask for some help.

THE AIM

Read More

I’m setting up a fairly basic implementation of a store locator. The user enters their postcode and selects from a predefined list of search radii. The gmaps API generates lat/long coordinates for this address and passes them to a php script. In this script the user coords are queried against a mysql database table (structure below)

post_id int(11)                             
post_type varchar(20)                                
lat   float(10,6)                               
lng   float(10,6)

The results of this query (post ids) are entered into a wordpress query which generates the XML that contains the map marker data. (the wordpress query uses post__in and posts_per_page -1 to display info for all ID generated by the query

THE PROBLEM

In a nutshell, every implementation of the Haversine formula I’ve come across seems to result in missing markers – specifically any markers that are very close to the users entered coordinates (don’t know precisely but I think it’s within about 500m). This is a big problem as if the user enters their postcode and there is a store very close to their location it won’t show up.

I’ve tried about 8 different permutations of the forumla that I’ve dug up from various tutorials with the same results. Below is the formula that I’m currently using on the site which provides all markers except for the those very close to the users entered position:

$center_lat = $_GET["lat"];
$center_lng = $_GET["lng"];
$radius = $_GET["radius"];

// Calculate square radius search

$lat1 = (float) $center_lat - ( (int) $radius / 69 );
$lat2 = (float) $center_lat + ( (int) $radius / 69 );
$lng1 = (float) $center_lng - (int) $radius / abs( cos( deg2rad( (float) $center_lat ) ) * 69 );
$lng2 = (float) $center_lng + (int) $radius / abs( cos( deg2rad( (float) $center_lat ) ) * 69 );

$sqlsquareradius = "
SELECT 
post_id, lat, lng
FROM
wp_geodatastore
WHERE
lat BETWEEN ".$lat1." AND ".$lat2."
AND
lng BETWEEN ".$lng1." AND ".$lng2."
"; // End $sqlsquareradius

// Create sql for circle radius check
$sqlcircleradius = "
SELECT
t.post_id,
3956 * 2 * ASIN(
    SQRT(
        POWER(
            SIN(
                ( ".(float) $center_lat." - abs(t.lat) ) * pi() / 180 / 2
            ), 2
        ) + COS(
            ".(float) $center_lat." * pi() / 180
        ) * COS(
            abs(t.lat) * pi() / 180
        ) * POWER(
            SIN(
                ( ".(float) $center_lng." - t.lng ) * pi() / 180 / 2
            ), 2
        )
    )
) AS distance
FROM
(".$sqlsquareradius.") AS t
HAVING
distance <= ".(int) $radius."
ORDER BY distance
"; // End $sqlcircleradius


$result = mysql_query($sqlcircleradius);

$row = mysql_fetch_array( $result );

while($row = mysql_fetch_array( $result )) {
// the contents of each row
$post_ids[] = $row['post_id'];
}

There was 1 formula that I tried that was suggested by Mike Pelley here: Geolocation SQL query not finding exact location

This formula seemed to show markers that were very close to the users entered location but missed out others that should have been displayed within the given radius. To clear up any confusion this is the code I used:

$center_lat = $_GET["lat"];
$center_lng = $_GET["lng"];
$radius = $_GET["radius"];

$sql = "
SELECT post_id, lat, lng, 
truncate((degrees(acos( sin(radians(lat)) 
* sin(radians(".$center_lat.")) 
+ cos(radians(lat)) 
* cos(radians(".$center_lat.")) 
* cos(radians(".$center_lng." - lng) ) ) ) 
* 69.09*1.6),1) as distance 
FROM wp_geodatastore HAVING distance <= ".$radius." ORDER BY distance desc
"; // End $sqlcircleradius


$result = mysql_query($sql);

$row = mysql_fetch_array( $result );

while($row = mysql_fetch_array( $result )) {
// Print out the contents of each row
$post_ids[] = $row['post_id'];
}

THE REQUEST

Basically I would like to know why neither of these blocks of code are displaying the correct markers. If anyone can suggest any improvements to the code or can point me towards some resource that I might have missed that would be great

EDIT

Thought my psudeo answer was working but as it turns out that was still having problems. I’ve ended up going for a very different tack now and I’m using a very good jquery store locator which can be found here: http://www.bjornblog.com/web/jquery-store-locator-plugin

Won’t work for every project out there but for my needs it’s perfect (and works!)

Related posts

Leave a Reply

5 comments

  1. EDIT
    This location-finder comes up often enough that I’ve written an article on it.

    http://www.plumislandmedia.net/mysql/haversine-mysql-nearest-loc/

    Original Post

    Let’s start by dealing with the haversine formula once for all, by putting it into a stored function so we can forget about its gnarly details. NOTE: This whole solution is in statute miles.

    DELIMITER $$
    
    CREATE
        FUNCTION distance(lat1 FLOAT, long1 FLOAT, lat2 FLOAT, long2 FLOAT)
        RETURNS FLOAT
        DETERMINISTIC NO SQL
        BEGIN
        RETURN (3959 * ACOS(COS(RADIANS(lat1)) 
                     * COS(RADIANS(lat2)) 
                     * COS(RADIANS(long1) - RADIANS(long2)) 
                     + SIN(RADIANS(lat1)) 
                     * SIN(RADIANS(lat2)) 
                    )); 
        END$$
    
    DELIMITER ;
    

    Now let’s put together a query that searches on the bounding box, and then refines the search with our distance function and orders by distance

    Based on the PHP code in the your question:

    Assume $radius is your radius, $center_lat, $center_lng is your reference point.

    $sqlsquareradius = "
    SELECT post_id, lat, lng
      FROM
    (
        SELECT post_id, lat, lng,
               distance(lat, lng, " . $center_lat . "," . $center_lng . ") AS distance
          FROM wp_geodatastore
         WHERE lat >=  " . $center_lat . " -(" . $radius . "/69)
           AND lat <=  " . $center_lat . " +(" . $radius . "/69)
           AND lng >=  " . $center_lng . " -(" . $radius . "/69)
           AND lng <=  " . $center_lng . " +(" . $radius . "/69)
    )a
    WHERE distance <= " . $radius . "
    ORDER BY distance
    ";
    

    Notice a few things about this.

    First, it does the bounding box computation in SQL rather than in PHP. There’s no good reason for that, except keeping all the computation in one environment. (radius / 69) is the number of degrees in radius statute miles.

    Second, it doesn’t fiddle with the size of the longitudinal bounding box based on latitude. Instead it uses a simpler, but slightly too large, bounding box. This bounding box catches a few extra records, but the distance measurement gets rid of them. For your typical postcode / store finder app the performance difference is negligible. If you were searching many more records (e.g. a database of all utility poles) it might not be so trivial.

    Third, it uses a nested query to do the distance elimination, to avoid having to run the distance function more than once for each item.

    Fourth, it orders by distance ASCENDING. This means your zero-distance results should show up first in the result set. It usually makes sense to list nearest things first.

    Fifth, it uses FLOAT rather than DOUBLE throughout. There’s a good reason for that. The haversine distance formula is not perfect, because it makes the approximation that the earth is a perfect sphere. That approximation happens to break down at roughly the same level of accuracy as the epsilon for FLOAT numbers. So DOUBLE is deceptive numerical overkill for this problem. (Don’t use this haversine formula to do civil engineering work like parking lot drainage, or you will get big puddles a couple of epsilon, a few inches, deep, I promise.) It’s fine for store-finder applications.

    Sixth, you are definitely going to want to create an index for your lat column. If your table of locations doesn’t change very often, it will help to create an index for your lng column as well. But your lat index will give you most of your query performance gain.

    Lastly, I tested the stored procedure and the SQL, but not the PHP.

    Reference: http://www.scribd.com/doc/2569355/Geo-Distance-Search-with-MySQL
    Also my experience with a bunch of proximity finders for health care facilities.

    ————— EDIT ——————–

    If you don’t have a user interface that lets you define a stored procedure, that’s a nuisance. At any rate, PHP lets you use numbered parameters in the sprintf call, so you can generate the whole nested statement like this. NOTE: You might need %$1f etc. You’ll need to experiment with this.

    $sql_stmt = sprintf ("
      SELECT post_id, lat, lng
        FROM
      (
        SELECT post_id, lat, lng,
               (3959 * ACOS(COS(RADIANS(lat)) 
                     * COS(RADIANS(%$1s)) 
                     * COS(RADIANS(lng) - RADIANS(%$2s)) 
                     + SIN(RADIANS(lat)) 
                     * SIN(RADIANS(%$1s)) 
                ))
               AS distance
          FROM wp_geodatastore
         WHERE lat >=  %$1s -(%$3s/69)
           AND lat <=  %$1s +(%$3s/69)
           AND lng >=  %$2s -(%$3s/69)
           AND lng <=  %$2s +(%$3s/69)
      )a
       WHERE distance <= %$3s
       ORDER BY distance
    ",$center_lat,$center_lng, $radius);
    
  2. Here’s a solution I used successfully for a while in my own geo proximity calculations:

    /**
     * This portion of the routine  calculates the minimum and maximum lat and
     * long within a given range.  This portion of the code was written
     * by Jeff Bearer (http:return true;//www.jeffbearer.com).
     */
    
    $lat = somevalue;      // The latitude of our search origin
    $lon = someothervalue; // The longitude of our search origin
    $range = 50;   // The range of our search, in miles, of your zip
    
    // Find Max - Min Lat / Long for Radius and zero point and query only zips in that range.
    $lat_range = $range / 69.172;
    $lon_range = abs($range / (cos($lon) * 69.172));
    $min_lat = number_format($lat - $lat_range, '4', '.', '');
    $max_lat = number_format($lat + $lat_range, '4', '.', '');
    $min_lon = number_format($lon - $lon_range, '4', '.', '');
    $max_lon = number_format($lon + $lon_range, '4', '.', '');
    
    /* Query for matching zips:
    
        SELECT post_id, lat, lng
        FROM wp_geodatastore
        WHERE
        lat BETWEEN $min_lat AND $max_lat
        AND lng BETWEEN $min_lon AND $max_lon
    */
    
  3. This is code from a working production system,

    6371.04 * acos(cos(pi()/2-radians(90-wgs84_lat)) * cos(pi()/2-radians(90-$lat)) * cos(radians(wgs84_long)-radians($lon)) + sin(pi()/2-radians(90-wgs84_lat)) * sin(pi()/2-radians(90-$lat))) as distance
    

    Uses a different distance formular, but for a store locator the difference is minimal.

  4. Thinking a little laterally I’ve come up with a ‘sort of’ solution to the problem of the missing markers. The two equations I posted originally gave the correct results but each missed out either markers close to the target or on the edges of the search radius

    It’s not very elegant but I figured that running both equations and producing 2 arrays which I then combined (removing any duplicates) would give me all the markers I’m looking for. This does work (obviously a performance hit but it’s not a high traffic application) so I’ll work with this for the time being but I’m still after a more practical solution if anyone has one!

  5. You can try my class at http://www.phpclasses.org/package/6202-PHP-Generate-points-of-an-Hilbert-curve.html. It uses the harvesine formula and a hilbert curve to compute a quadkey. You can then search the quadkey from left to right. Every position of the key is a point on the monster curve. A better explanation of the curve can be found at Nick’s spatial index quadtree hilbert curve blog. It’s like using the spatial index extension from mysql but you have more control. You can use a z curve or moore curve or you can change the look.