This is the second article in our ongoing series “Under the Hood”, where we explore how the mobile location data ecosystem works, and what is required to turn that information into a precise understanding of physical world behavior.
In our first article “The Blue Dot Effect”, we discussed how precision filtering techniques can solve for location signals with a wide margin of error, sharing some of the approaches built into the NinthDecimal platform. In this next article, we discuss how these techniques combine with different data processing approaches to determine physical world context.
“The Drift Zone”
Location data is one of the hottest topics in our industry today, which is not surprising given the explosion of geo-enabled apps and countless business and consumer services powered by this data. The strides location technologies have made and continue to make are tremendous, considering the signal has only really been available for consumer applications for less than a decade. Before the advent of the iPhone and the smartphone revolution, mobile carriers offered early location services like emergency 911 or family finder type services. The launch of the iPhone in 2007 included access to access to cell tower triangulation data to power rudimentary consumer-based location services. Of course, this kind of data alone was only accurate to about 1,000 meters or a little more than 10 football fields. Not very useful for today’s consumer applications and location services.
The good news is that the accuracy and precision of location data have dramatically improved and continue to evolve with each generation of devices and operating systems. A study conducted in 2009 by Paul Zandbergen, Department of Geography at the University of New Mexico, found a significant improvement in accuracy in the first two years alone. Using an iPhone 3g, Zandbergen found that the smartphone could gather location accuracy to within 14.4 meters 95% of the time, with a median accuracy of 6.9 meters. Just two years later, he repeated the study with Motorola and Sanyo devices and found that location accuracy had improved to a range of 5 meters to 8.5 meters.
In the five years since Zandbergen’s 2011 study, there have been numerous new iPhone and Android models, and even more numerous versions of the operating systems, each with new improvements in location accuracy. Location services today include a combination of signals including GPS, wi-fi, and cell tower data, as well as compass, barometer, accelerometer and beacon data. Continued testing by the leading standalone GPS device manufacturers (such as Trimble, Fulcrum, and others) have found that location services on mobile phones today are capable of an average accuracy range of three to five meters. There also exists beacon technology deployed today that can support geolocation precision measured in centimeters, enabling aisle-level location accuracy within a store.
What is the Drift Zone?
The Drift Zone is this accuracy range for mobile location signals generated by today’s mobile devices. Think of it as a “plus or minus” error band. Today, this averages between 3 and 5 meters as discussed above. Recognizing the state of the industry, we seek to account for and filter out location data with particularly poor accuracy. Of course, there are tried and true methods of geographic filtering such as eliminating centroids for states, DMAs, zip codes, and so forth. However, we can do more with data science and determine small or high drift by using algorithmic techniques like frequency counting, clustering, and other heuristics. Our “Blue Dot Effect” article, the first in our Under the Hood series, is one example of how data science can identify and filter location data with large drift zones. In doing so, platforms can work with more curated datasets. And these datasets have smaller margins of error than the average for the overall mobile ecosystem.
How Does the Drift Zone Relate to Decimal Points?
Latitude and longitude coordinates are typically expressed in decimal degrees, such as (37.78905, -122.40461). The number of digits after the decimal point indicates the approximate level of precision of the coordinates. Three decimals indicate precision down to 110 meters, four decimals indicate 11-meter precision, five decimals indicate 1-meter precision, and so on. Thus, we can equate the number of decimal places with the approximate size of a geographic area centered around that point, i.e. three decimal places (110 meters) is about the size of a large city block while two decimal places (1.1 kilometers) can represent a city. So, you could say: “I am at coordinates (37.79, -122.40)”, which translates to “somewhere in San Francisco”. Or, with a more precise measurement you could say: “I am at coordinates (37.78905, -122.40461)”, which is the ping pong table in our San Francisco office. Certainly, five decimal places is fairly precise, while two decimal places have a high level of uncertainty on pinpointing the subject.
But the coordinates themselves do not capture the potential error band of the signal, or what we call the Drift Zone. This will add more uncertainty to the true location. A five decimal lat/lon with a high Drift Zone is simply a 1-meter area with a margin of error far greater than the average 3 to 5 meters, while a five decimal lat/lon with a low Drift Zone represents a 1-meter area with a margin of error better than the average 3 to 5 meters. A combination of data science techniques is required to confidently assert that a mobile device-generated lat/lon coordinate is a good representation of where that device actually is in the physical world.
The Marketer’s Choice
How does all of this impact a marketer? Every marketing platform that leverages mobile location data has a different approach to processing latitude and longitude signals. The approach dictates the level of precision of a platform. It dictates how well that raw data can be contextualized, and ultimately how it is turned into something of value to a marketer, namely physical world behavior. The approaches fall into two general categories – Inside the Drift Zone or Outside the Drift Zone. Each has different margins of error every marketer should understand. Fundamentally, each marketer has to decide what their acceptable margin of error is.
Inside the Drift Zone?
NinthDecimal’s platform is built on this approach. Quite simply, we ingest mobile location data that is a highly curated through a scrubbing and filtering process designed to narrow the average drift. Our platform then contextualizes that data by looking at the physical world around each data point. This process helps us identify whether a device is in a restaurant, a clothing retailer, or an automotive dealership (more on this process in future articles). Recognizing that even after all that work there is still some inherent margin of error in the data, we built a platform that processes our data Inside the Drift Zone. In other words, we process location data on a smaller basis than the typical drift zone of the location data signal. By doing so, our platform doesn’t materially add to the inherent margin of error in the signal.
So when looking at a common five decimal point latitude / longitude signal (e.g a 1-meter area), the drift zone needs to be added to account for the inherent margin of error. Unfiltered data makes this an average of 3 to 5 meters. By applying data science to remove the signals with higher drift, that average can be further narrowed, increasing confidence of the precision of a signal. A data processing and contextualization capability within this tighter range maintain the precision of a signal. And this very narrow band doesn’t fundamentally change the context of that device’s location.
Outside the Drift Zone?
The second approach is to process location data and contextualize it based on a much larger geographic area. Tile based systems and most geo-fence based systems are examples of this approach. If the size of the platform’s data processing area is greater than the typical drift zone of the location data signal, this is considered Outside the Drift Zone. In this case, the processing approach increases the margin of error inherent in the signal. The greater the size of a processing area, the greater the increase in the Drift Zone. So in the case of a 100-meter processing area, the edge of that geographic space can be 80 meters away from the actual location signal in one direction, and 20 meters away in the other direction, per the illustration above. A device on one edge of that same processing area can be up to 100 meters away from the other edge, or up to 140 meters away when measured from one corner to the opposite corner. To put that in perspective, a standard New York City block is 80 meters by 274 meters. Since the processing area is how context is applied to a location signal, this increased margin of error can dramatically alter context.
It really comes down to a marketer’s choice. Accurate within a few meters 99% of the time or inaccurate by potentially hundreds of meters 99% of the time?
Unusual Edge Cases
So why isn’t processing Inside the Drift Zone accurate 100% of the time? Quite simply, there are some extreme use-cases where the wrong context will inevitably be applied. Remember, we’re working in the range of a few meters here. For example, if a given lat/long data point is along the wall of Store A, the Drift Zone can extend across that wall into neighboring Store B. In this case, the margin of error in the mobile location signal may cause the platform to place the device in Store A even though in actuality it may be in Store B. However, think of your own daily experience and how often you see use cases like this, such as someone standing against the wall of a store. In the massive trove of over 1.5 trillion data points we process every month, those are limited use cases and do not materially impact the contextualization of those trillions of data points.
Edge Cases Apply to All
That same scenario and use case applies to all methods of processing location data. Just as standing up against the edge of one store can bump a device into another store, a latitude / longitude signal from a device on the edge of a tile or geo-fence can bump that context into the neighboring tile or geo-fence. The larger the geographic area processing structure, the greater the increase in potential margin of error from that bump.
Just as location services have become more and more accurate since the iPhone first launched in 2007, they are continuing to do so going forward. Already the industry is producing beacon data, which requires a platform that can process precise signals on a level far smaller than a five decimal point latitude / longitude. New advances in GPS technologies such as Differential GPS (DGPS) and Wide Area Augmentation System (WAAS) bring location accuracy down to less than 1-meter. In December last year, researchers at the University of California, Riverside, announced another breakthrough that makes GPS accurate down to within centimeters. And in March 2016, MIT researchers demonstrated that WiFi can geolocate within 65 centimeters for a single WiFi access point. All of this is making its way into your mobile device as these advances are commercialized.
At NinthDecimal, we are always planning for the future. We built a platform that processes location data Inside the Drift Zone to provide the most precise and accurate context in the industry. And we know that advances in technology are only making that already small Drift Zone smaller and smaller each year. We’re going to miss you, Drift Zone!
I) Accuracy of iPhone Locations: A Comparison of Assisted GPS, WiFi and Cellular Positioning, Transactions in GIS, 2009.
II) Positional Accuracy of Assisted GPS Data from High-Sensitivity GPS-enabled Mobile Phones, Journal of Navigation, July 2011.
III) Computationally Efficient Carrier Integer Ambiguity Resolution in Multiepoch GPS/INS: A Common-Position-Shift Approach, IEE Transactions on Control Systems Technology, December 2015.
IV) Decimeter-Level Localization with a Single WiFi Access Point,Usenix Symposium, March 2016.
About the Author
Fairiz Azizi is currently Director of Engineering at NinthDecimal. He has had over 15 years pf experience in building systems at scale. He currently oversees analytics, data and platform technologies.