Client Background

Our client is a Real Estate company working on solutions to reduce the US housing shortage. They sought to identify potential lots in key US cities that could be suitable for the solutions they offer. They wanted a data-driven approach to identify the most promising candidate lots to narrow down the selection.

Challenges

Multiple public and private data sources are available to assist with this task, but they are siloed and available in differing formats with different degrees of quality.

There were several challenges to address when it came to the task, primarily due to the fragmentation of public and private data sources available. These sources were varied in terms of format and quality, making it difficult for our client to take advantage of the disparate data sources without cleaning and combining.

Our Approach

We created a centralized database containing various geographic data, including parcels, building blueprints, zoning, topography, ownership, and valuation data.

Once the data was available in a centralized database, we developed a series of data transformations and a custom algorithm to identify the best candidate lots.

Data Collection and Storage:

The data was collected from various sources, including state government websites and private sources. The data was then loaded into a PostgreSQL database with the PostGIS extension, which provides various tools for effectively manipulating geospatial data.

Data Analysis:

Python and dbt were used to clean, integrate, and transform the data. Our algorithm leverages dozens of data points across multiple geospatial sources to identify the best areas suitable for property development.

Visualization:

The data was visualized in QGIS and Power BI to identify lots with development potential. Visualization of the data provided the client with a clear understanding of the potential lots and allowed deeper on-demand exploration.

We created a centralized database containing multiple geographic data sources containing information about parcels, building blueprints, zoning, topography, ownership, and property valuation. This enabled us to develop a series of data transformations and a custom algorithm to identify the best candidate lots.

The data collection and storage process involved sourcing from state government websites and private sources before loading into a PostgreSQL database with the PostGIS extension - providing various tools for effectively manipulating geospatial data.

Python and dbt (data-build-tool) were then used to clean, integrate, and transform the data. At the same time, our algorithm leveraged dozens of data points across multiple geospatial sources to identify the best areas suitable for property development.

For visualization purposes, we used QGIS and Power BI - giving the clients a clear understanding of potential lots with development potentials and deeper on-demand exploration.

Impact

  • Our client executed multiple targeted marketing campaigns using the information generated by our algorithm
  • The campaigns have been successful, leading to the acquisition of multiple lots
  • The centralized database contains information and metrics pertaining to multiple US cities, and our client is using these metrics to identify the best markets for expansion

Our client was able to execute targeted marketing campaigns effectively with the information provided by our algorithm, leading to the successful acquisition of multiple lots. Additionally, our centralized database contains detailed metrics pertaining to multiple US cities - allowing our client to identify the best markets for further expansion.

Technologies used:

Postgres, PostGIS, Python, dbt, PowerBI, QGIS

Key Takeaways

  • Our client was able to execute successful targeted marketing campaigns based on the data generated by our algorithm.
  • The centralized database contains detailed metrics pertaining to multiple US cities, enabling our client to identify the best markets for expansion.
  • This led to the acquisition of multiple lots.