Introduction

Geoprox is a geospatial indexing and querying API designed to simplify and optimize location-based operations.

It provides efficient methods for encoding and decoding geohashes, finding neighboring regions, and managing geospatial indexes for a variety of applications such as food delivery services and real-time inventory tracking.

Key Features:

  • Geohash Encoding and Decoding: Convert geographic coordinates into geohashes and retrieve coordinates from geohashes for easy location-based operations.
  • Neighbor Identification: Quickly find neighboring geohashes to support spatial queries and region-based operations.
  • Geospatial Index Management: Create, update, and delete geospatial indexes to organize and query location-based data efficiently.
  • Proximity Searches: Perform radius-based searches to locate nearby resources or inventory items.

Benefits:

  • Efficient Spatial Data Handling: Streamline operations by leveraging geohashes for compact and hierarchical representation of locations.
  • Flexible Querying: Use proximity-based queries to enhance real-time decision-making and operational efficiency.
  • Easy Integration: Seamlessly integrate Geoprox with applications for real-time tracking, delivery services, and more.

Contributing

Geoprox is an open-source project and welcomes contributions from the community. If you’d like to contribute, please visit the GitHub repository to find the source code, report issues, and submit feature requests. For detailed guidance on contributing, please refer to the CONTRIBUTING document.

License

Geoprox is licensed under the Apache 2.0 or MIT License (your choice).

How Geoprox Works

The Problem

Initially, for the May 2024 Rust Indy.rs meetup, the proof of concept problem began with ride-share pairing like Uber or Lyft.

Ride pairing depends on being able to track and pair a set of drivers within the vicinity of some location (i.e., the pickup location of an order).

Since then, the problem has generalized to how can we efficiently track and retrieve resources geographically near some location.

Objectives

  • Keep track of the approximate location of objects.
  • Search should return a set of objects within a radius of the search location.

Solution

Index the objects and hash their key and geographical position. Using a geohash to encode the approximate location, we can map geohash prefixes into the set of objects contained in the geohash using a Prefix Tree stored in-memory.

We can efficiently partition the search space and perform a nearest neighbor search on a merged set of objects in the search region and return the results.

What is a Geohash?

A geohash is a string representation of a geographic location, encoded as a series of characters. This encoding allows for efficient storage and querying of spatial data. Geohashes are hierarchical, meaning that as the string length increases, the represented area becomes more precise.

How Geohashes Work

Geohashes divide the world into a grid of cells, each with a unique identifier (encoded in base 32). The precision of the geohash can be adjusted by changing its length:

  • Short geohashes represent larger areas.
  • Long geohashes represent smaller, more precise areas.
Geohash LengthLat BitsLng BitsLat ErrorLng ErrorKm Error
123±23±23±2,500 km (1,600 mi)
255±2.8±5.6±630 km (390 mi)
378±0.70±0.70±78 km (48 mi)
41010±0.087±0.18±20 km (12 mi)
51213±0.022±0.022±2.4 km (1.5 mi; 2,400 m)
61515±0.0027±0.0055±0.61 km (0.38 mi; 610 m)
71718±0.00068±0.00068±0.076 km (0.047 mi; 76 m)
82020±0.000085±0.00017±0.019 km (0.012 mi; 19 m)

For example, the following geohashes are all within a maximum distance of +/- 78km (48mi) from each other:

  • s008nb00j8
  • s00twy01mt
  • s00j8n012j

However, these geohashes represent completely different regions:

  • s128nb00j8
  • s34twy01mt
  • s56j8n012j

Benefits of Using Geohashes

  • Compact Representation: Efficiently stores geographic coordinates.
  • Hierarchical Structure: Allows for easy aggregation and precision adjustment.
  • Proximity Queries: Simplifies the process of finding nearby locations.

Check out the Geohash Explorer for an interactive map to help visualize geohashing.

Prefix Searching

Prefix search is a method for quickly finding all entries in a dataset that share a common prefix. In the context of geohashes, it allows us to retrieve all geohashes that start with the same characters, which is useful for geographical proximity searches.

Trie Tree

We can store Geohashes in a Trie Tree data structure. Tries enable us to efficiently retrieve geohashes with a common prefix in O(L) time, where L is the length of the prefix being searched.

Key Features of Trie Tree:

  • Efficient Retrieval: Quickly find geohashes sharing a common prefix.
  • Time Complexity: O(L), where L is the prefix length.

Patricia Trie

A Patricia Trie, also known as a Radix Tree, is a type of Trie Tree that optimizes space by merging common prefixes into single nodes. This further enhances the efficiency of prefix searching by reducing the number of nodes.

Key Features of Patricia Trie Tree:

  • Space Optimization: Merges common prefixes to save space.
  • Efficient Lookup: Retrieves geohashes with common prefixes efficiently.
  • Time Complexity: Best case O(log N) where N is the length of the longest geohash stored, and worst case O(L) where L is the length of the prefix.

Comparisons

Skip List

A Skip List is a probabilistic data structure that allows for fast search, insertion, and deletion operations. It consists of multiple levels of linked lists, with each higher level acting as an "express lane" for elements lower down.

  • Pros:

    • Efficient Average-case Performance: Provides good average-case performance for search operations, typically O(log N) time complexity.
    • Cache-friendly: The contiguous arrangement of nodes makes Skip Lists more cache-friendly than tree-based structures.
  • Cons:

    • Space Overhead: Requires additional space to maintain pointers for multiple levels, resulting in higher space usage compared to a Patricia Trie.
    • Probability Calculations: Maintaining balance requires careful calculation of node levels' probabilities, involving some understanding of probability theory.

B-Trees and MemTables

B-Trees are balanced tree data structures that maintain sorted data and enable searches, sequential access, insertions, and deletions with logarithmic time complexity. MemTables are in-memory data structures used in conjunction with B-Trees to facilitate efficient storage and retrieval of key-value pairs before persisting data to disk.

  • Pros:

    • Efficient Disk-based Storage: Ideal for managing large datasets and effective for disk-based storage solutions.
    • Range Queries and Sorted Data Retrieval: Provides excellent performance for range queries and retrieving sorted data efficiently.
  • Cons:

    • Memory Usage: MemTables require significant memory to store data before flushing to disk, which can be a limitation for very large datasets.
    • Complexity in Merging: Managing and merging data between MemTables and disk-based storage can introduce complexity and overhead in maintaining data consistency.

Why Patricia Trie?

For geospatial data, where object locations might change frequently, in-memory solutions like Patricia Trie are preferred. They offer superior speed and efficiency in managing dynamic data, avoiding the latency associated with disk I/O operations.

Insertion and Query

This section outlines the fundamental design strategy behind Geoprox, detailing the algorithms used for inserting and querying geospatial data.

The core approach involves encoding geographic coordinates into geohashes, utilizing a Patricia Trie for efficient storage and retrieval, and employing a KD-Tree for fast nearest neighbor searches. These techniques collectively enable Geoprox to efficiently manage and query large volumes of geospatial data.

Data Structures

  • Approximate Position Tracking:

  • Key/Value Storage:

    • Use StringPatriciaMap for efficient key/value storage.
    • Format: geohash_region => HashSet(object_key)

Insert Procedure

  1. Hash Object's Latitude/Longitude:

    • Hash the object's latitude and longitude up to a certain depth (e.g., 6). This depth, called insert_depth, should be sufficient to ensure the resulting geohash region is large enough to store the density of objects in that area.
  2. Upsert Object Key:

    • Insert or update the object_key in the StringPatriciaMap with its geohash_region.
    • If object_key had an existing geohash_region, remove its entry from that region in the StringPatriciaMap.

Search Procedure

  1. Initiate Search:

    • A search is initiated at a specific location with a given search range (in kilometers).
  2. Encode Latitude/Longitude:

    • Encode the latitude and longitude into a geohash up to a specific depth (e.g., 6). This depth, called search_depth, should ensure the geohash region is small enough to fit within the search radius.
  3. Optimize Geohash Search Region:

    • Optimize the geohash search region by expanding it until it covers the search area.
    • If the geohash search region does not contain the search radius, remove the last character (effectively expanding the region) until it fits within the search area.
  4. Partition StringPatriciaMap:

    • Partition the StringPatriciaMap by the common prefix of the optimized geohash.
  5. Construct KdTree:

    • Iterate over the partition and construct a KdTree.
    • Store objects and their approximate locations in the KdTree.
  6. Compute Nearest Neighbor:

    • Perform nearest neighbor computation with the search requirements against the KdTree to get the final result.

Getting Started

Geoprox comes in different flavors depending on what you're cooking up. If you're looking to dive deep and use core features directly in your Rust projects, the geoprox-core crate is your go-to. It's packed with all the fundamental data structures that make Geoprox tick.

But if you're aiming for a more out-of-the-box experience, the main geoprox package is where the magic happens. It's the CLI runtime designed to sidecar with your other application services, making it super easy to integrate Geoprox into your existing setup.

Crates

Installation

To install geoprox, use the following command:

cargo install geoprox

Usage

Geoprox provides several commands to start the server and work with geohashes:

geoprox help
Usage: geoprox [COMMAND]

Commands:
  run     Start Geoprox server
  encode  Hash latitude/longitude into geohash
  decode  Decode geohash into approximate longitude/latitude
  spec    Generate an OpenAPI specification file
  help    Print this message or the help of the given subcommand(s)

Options:
  -h, --help     Print help
  -V, --version  Print version

To start using Geoprox, you can run the following command to start the server:

geoprox run

You can also encode and decode geohashes directly from the CLI:

# Encode latitude and longitude into geohash
geoprox encode --lat=37.7749 --lng=-122.4194 --depth=5

# Decode geohash into latitude and longitude
geoprox decode 9q8yy

For detailed help on any command, use the help command:

geoprox help encode

Configuration

Geoprox run command allows specifying a configuration file using the -c or --config option or set the GEOPROX_CONFIG environment variable. This file can contain various settings to customize the behavior of the Geoprox server and commands. The configuration can be provided in any common format such as YAML, TOML, JSON, or INI.

Example Usage

Specify config using -c flag:

# from current working directory
geoprox run -c geoprox.toml

# from absolute path
geoprox run -c /my/custom/config.yaml

or using environment variables:

export GEOPROX_CONFIG='/my/custom/config.json'

geoprox run

Example Config

Here's an example configuration file in TOML format:

# All settings are optional; these are the default values.
[server]
# The address the server will bind to
http_addr = '0.0.0.0'
# The port the server will listen on
http_port = 5000
# Request timeout
timeout = '10s'

[shard]
# Determines the default geohash length for inserts
insert_depth = 6
# Determines the default geohash length for searches
search_depth = 6
# Specifies the default number of results returned in range queries
default_count = 100
# Toggles the default sorting behavior for query results
default_sorted = false

[server.snapshots]
# Toggle snapshot usage
disabled = false
# Directory where snapshots will be stored
path = '/var/lib/geoprox/snapshots'
# How often snapshots will be taken
every = '30s'

Environment Variables

These are the currently supported environment variables. They will take precedence over settings defined in the configuration file.

Environment VariableDescriptionDefault Value
GEOPROX_CONFIGSpecifies the configuration file path.
GEOPROX_HTTP_ADDRThe address the server will bind to.0.0.0.0
GEOPROX_HTTP_PORTThe port the server will listen on.5000

Fine Tuning

The insert_depth and search_depth settings control the precision of geohashing and impact the performance of distance calculations and search queries.

Optimizing insert_depth

Higher Insert Depth:

  • Description: Objects are grouped into smaller, more precise regions.
  • Impact: Searching takes slightly longer, but distance calculations are more accurate.
  • Use Case: Ideal for scenarios where precise distance measurements are crucial.

Lower Insert Depth:

  • Description: Objects are grouped into larger regions.
  • Impact: Search performance improves, but distance calculations become less accurate.
  • Use Case: Best for cases where fast general distance estimates are acceptable.

Optimizing search_depth

Higher Search Depth:

  • Description: The initial search region is smaller and more precise.
  • Impact: Search queries take slightly longer, but results are more accurate.
  • Use Case: Suitable for narrow range queries where high accuracy is needed.

Lower Search Depth:

  • Description: The initial search region is larger, leading to faster searches.
  • Impact: Search speed improves, but accuracy may be reduced.
  • Use Case: Ideal for wide range queries where speed is prioritized over precision.

Docker Guide

The Geoprox Docker image (ezrasingh/geoprox) is available on two registries:

Getting Started

To quickly get started with Docker Compose, skip to the Quick Start Guide.

Pull the Geoprox container from one of the available repositories.

# from Docker Hub
docker pull ezrasingh/geoprox:latest

# or from GitHub Container Registry
docker pull ghcr.io/ezrasingh/geoprox:latest

To run the container with default configurations, use:

docker run -t -p 5000:5000 ezrasingh/geoprox:latest

This command will start the Geoprox container with the default settings, exposing port 5000 for access.

To use another port on your machine try -p <my-port>:5000

Variants & Tagging

The Geoprox image supports two variant tags: alpine and debian.

By default, the latest tag uses the alpine variant. Latest is built from the main branch.

To pin your Geoprox instance to a specific version and variant, use the format <version>-<variant>, such as v0.3.0-debian.

If you specify only the version, like v0.3.0, it will default to the alpine variant, resulting in v0.3.0-alpine.

Check the respective registry for available tags.

Customizing the Configuration

For complete settings and configuration details, refer to the Geoprox CLI README.

Geoprox allows specifying a configuration file using the -c or --config option or set the GEOPROX_CONFIG environment variable. Geoprox will automatically detect and parse configuration files in formats like YAML, TOML, JSON, or INI if they are named geoprox.yaml, geoprox.toml, geoprox.json, or geoprox.ini, respectively.

For example, if you have a configuration file named geoprox.toml in the container working directory (/var/lib/geoprox), you can run the container with the following command:

docker run -t -p 5000:5000 \
    -v $(pwd)/geoprox.toml:/var/lib/geoprox/geoprox.toml:ro \
    ezrasingh/geoprox:latest \
    -c geoprox.toml

In this command:

  • -v $(pwd)/custom-config.yaml:/some/path/custom-config.yaml:ro mounts the local configuration file into the container.

If you need to specify a different path or file name for your configuration, use the -c or --config option:

docker run -t -p 5000:5000 \
    -v $(pwd)/custom-config.yaml:/some/path/custom-config.yaml:ro \
    ezrasingh/geoprox:latest \
    -c /some/path/custom-config.yaml

In this command:

  • -c /some/path/custom-config.yaml specifies the configuration file to be used by the Geoprox server.

Or use the GEOPROX_CONFIG environment variable:

docker run -t -p 5000:5000 \
    -v $(pwd)/custom-config.yaml:/some/path/custom-config.yaml:ro \
    -e GEOPROX_CONFIG='/some/path/custom-config.yaml' \
    ezrasingh/geoprox:latest

In this command:

  • -e GEOPROX_CONFIG='/some/path/custom-config.yaml' specifies the configuration file to be used by the Geoprox server.

Quick Start

To quickly get started with Geoprox using Docker, follow these steps to configure and run the server with Docker Compose.

The provided Docker Compose configuration serves as a boilerplate to showcase how Geoprox can be configured and allows you to get an instance up and running fast for evaluation purposes.

Docker Compose

Use the following command to start the Geoprox service:

docker-compose -f quickstart/docker-compose.yml up

This command will start the Geoprox service using the configuration specified in quickstart/docker-compose.yml. The configuration file in this setup is a basic example, allowing you to easily evaluate Geoprox with default settings.

For custom configurations, you can modify the Docker Compose file or replace it with your own configuration as needed.

Client SDK

HTTP Client libraries are generated from an OpenAPI specification using OpenAPI Generator.

HTTP Client Libraries

Availble clients can be found at contrib/client-sdk, along with steps for generating your own client if needed.

LanguagePackage NameRepository Link
C#GeoproxClientGeoproxClient
Dartgeoprox_clientgeoprox_client
Dart (Dio)geoprox_client_diogeoprox_client_dio
Elixirgeoprox_clientgeoprox_client
Gogeoprox_clientgeoprox_client
Groovygeoprox-client-groovygeoprox-client-groovy
Kotlingeoprox_clientgeoprox_client
PHPGeoproxClientGeoproxClient
Pythongeoprox_clientgeoprox_client
Rubygeoprox_clientgeoprox_client
Rustgeoprox-clientgeoprox-client
TypeScript (Axios)geoprox-client-ts-axiosgeoprox-client-ts-axios
TypeScript (Fetch)geoprox-client-ts-fetchgeoprox-client-ts-fetch
TypeScript (Node)geoprox-client-ts-nodegeoprox-client-ts-node
TypeScript (RxJS)geoprox-client-ts-rxjsgeoprox-client-ts-rxjs