Subfields of Spatial AI
Practically speaking, there are already a number fields that could fall under “spatial AI.” It’s worth listing an talking through them as a starting point for designing new AI.
Most of these have a superficial sense of “space.” Their defining quality is often merely using data with latitude-longitude or x-y-z coordinates, and that’s all.
Good But Not The Whole Picture
We can get a lot of utility from these methods, but none of them really speak to the essence of what spatial AI can be, a generalized theory and practical field that unpacks the underlying patterns on how we can reason about space.
The Evolving List
- spatial computing (a term originally out of MIT in 2003 and recently popularized by Apple)
- spatial analytics (e.g., depth estimation)
- 3D scene comprehension
- 3D object recognition (e.g., scan-to-BIM, point-cloud-to-object)
- algorithms supporting virtual reality, mixed reality, augmented reality
- body and motion tracking
- autonomous driving
- robotics
- smart logistics (e.g., supply chain optimization, planning)
- generative 3D modeling (e.g., text-to-point-cloud, text-to-object)
- generative layouts (e.g., text-to-floorplan, parametric floor plans)
- gaming AI
- collective intelligence (e.g., swarms and flocking behavior, agent-based models, crowd-sourced information like Waze, crowd simulations, organizational modeling, etc.)
- surveillance AI
- geospatial AI
- urban AI
Some of them use AI to navigate the peculiarities of everyday physical-temporal space, like “robotics.” They leverage techniques like computer vision to make sense of a physical space and even determine spatial arrangements.
others make predictions in virtual space (VR/XR/AR), and still others optimize decision-making in complex systems (urban AI).
Some of what these techniques do is seemingly straightforward, like translating one medium or data domain to another, attempting to fill in some blanks as it goes.
“Spatial analytics” is a collection of algorithms like “depth estimation,” which starts with a raster image of a physical scene, then predicts a physical distance from the camera for each pixel. Text-to-point-cloud attempts to infer the best set of points representing a physical or virtual object given a human-provided text, like, “a mid-Century modern table lamp.”
And some have inverses. “3D object recognition” attempts to predict the name of a physical object given a collection of points in Cartesian space.
It’s as if these AI are attempting to mimic humans’ natural abilities; we can typically readily guess relative depths from two-dimensional photographs (e.g., “Is the person in the photo really taller than the Eiffel Tower, or is she just much closer?”). And with the right training and software, we could model a virtual lamp for someone from their design notes.
Similarly, gaming AI attempts to make virtual characters behave as if they were controlled by human players.
Still others are optimization algorithms, such as recommending the best available paths to navigate complex dynamic environments, like those faced by autonomous vehicles.
And as with all types of AI, the above of course depend on what we mean by “artificial intelligence” and more critically what we mean by “spatial.”
Some of these start to show how environments can become “intelligent” in a way, like the emerging concept “urban AI,” that speaks to the value of data flows and automation to manage a city’s operations.