Insertion and Query
This section outlines the fundamental design strategy behind Geoprox, detailing the algorithms used for inserting and querying geospatial data.
The core approach involves encoding geographic coordinates into geohashes, utilizing a Patricia Trie for efficient storage and retrieval, and employing a KD-Tree for fast nearest neighbor searches. These techniques collectively enable Geoprox to efficiently manage and query large volumes of geospatial data.
Data Structures
-
Approximate Position Tracking:
- Utilize a
HashMap(* see v0.3.0 release notes) to track approximate positions. - Format:
object_key=>geohash(lat, lng)
- Utilize a
-
Key/Value Storage:
- Use
StringPatriciaMapfor efficient key/value storage. - Format:
geohash_region=>HashSet(object_key)
- Use
Insert Procedure
-
Hash Object's Latitude/Longitude:
- Hash the object's latitude and longitude up to a certain depth (e.g., 6). This depth, called
insert_depth, should be sufficient to ensure the resulting geohash region is large enough to store the density of objects in that area.
- Hash the object's latitude and longitude up to a certain depth (e.g., 6). This depth, called
-
Upsert Object Key:
- Insert or update the
object_keyin theStringPatriciaMapwith itsgeohash_region. - If
object_keyhad an existinggeohash_region, remove its entry from that region in theStringPatriciaMap.
- Insert or update the
Search Procedure
-
Initiate Search:
- A search is initiated at a specific location with a given search range (in kilometers).
-
Encode Latitude/Longitude:
- Encode the latitude and longitude into a geohash up to a specific depth (e.g., 6). This depth, called
search_depth, should ensure the geohash region is small enough to fit within the search radius.
- Encode the latitude and longitude into a geohash up to a specific depth (e.g., 6). This depth, called
-
Optimize Geohash Search Region:
- Optimize the geohash search region by expanding it until it covers the search area.
- If the geohash search region does not contain the search radius, remove the last character (effectively expanding the region) until it fits within the search area.
-
Partition
StringPatriciaMap:- Partition the
StringPatriciaMapby the common prefix of the optimized geohash.
- Partition the
-
Construct
KdTree:- Iterate over the partition and construct a
KdTree. - Store objects and their approximate locations in the
KdTree.
- Iterate over the partition and construct a
-
Compute Nearest Neighbor:
- Perform nearest neighbor computation with the search requirements against the
KdTreeto get the final result.
- Perform nearest neighbor computation with the search requirements against the