Air Quality Data Visualization for Environmental Research

Updated on:

July 26, 2024

Understanding air quality data is vital for making informed decisions and implementing effective policies within communities. However, dealing with raw data without visual aids can be challenging, especially when working with large datasets. Hence, visualizations are indispensable as they help researchers simplify complex information and enhance comprehension for the general public and stakeholders.

In this blog post, we'll explore common techniques for visualizing air quality data and take a quick look at new trends in presenting this data to the public.

‍‍

Key Challenges in Visualizing Air Quality Data

‍

1. Complexity and Volume of Data

Visualizing air quality data encounters challenges due to its nature and vast volume. It encompasses various parameters, such as particulate matter (PM2.5 and PM10), nitrogen dioxide (NO2), ozone (O3), and volatile organic compounds (VOCs), among others. These parameters are measured using different units, making comparison and aggregation difficult. Moreover, the extensive data collected from multiple monitoring stations or sensors further compounds the complexity.

To address this, data aggregation techniques summarize information at different spatial and temporal scales. Aggregated data can be depicted through charts, maps, and graphs, clearly illustrating air quality trends. Interactive visualizations empower users to filter and explore specific parameters or locations.

Alejandra Lizama, a natural resources engineer and project coordinator at MindEarth, a Swiss geo-spatial solutions provider, shared:

"Aggregating air quality data poses challenges such as heterogeneity, quality, time and volume of data. To overcome these, we implement robust data cleaning protocols to remove inconsistencies and errors, ensuring the data is reliable and accurate for analysis. This includes filtering out anomalous readings and calibrating sensor data. Moreover, we ensure that all data is converted into a standardized format to facilitate seamless integration and comparison. This involves using common units of measurement and consistent data structures".

2. Spatial and Temporal Variability

Air quality exhibits spatial and temporal variability due to localized emissions, weather patterns, and geographical factors, posing a challenge for representation. Static maps may fail to capture dynamic changes occurring throughout the day or across seasons. Understanding the correlation between air pollution and external factors is pivotal for identifying sources and formulating mitigation strategies.

To overcome these challenges, time series plots illustrate temporal variations, while heatmaps or contour plots reveal spatial patterns. Color-coded markers indicate pollutant levels at monitoring stations. Real-time or near-real-time data feeds offer up-to-date air quality information. Correlation plots, overlaying maps, and integrated dashboards facilitate the identification of relationships between air pollution and external factors.

Aotizhongxin Daily PM2.5 Consentration graph — Representing time series data From Ceshine Lee 2018

Alexander Dangel, a researcher in environmental economics at the Research Center for Environmental Economics, The University of Heidelberg, underscored:

"One of the foremost challenges in presenting air pollution information lies in accommodating its substantial spatial and temporal variations while ensuring digestibility. We acknowledge the considerable disparity in air pollution levels between streets, neighborhoods, cities, and regions, as well as the significant fluctuations over days, weeks, months, and seasons. Selecting the appropriate spatial and temporal scale is pivotal in addressing our research inquiries. Often, we are constrained by available AQ data and must rely on secondary estimates of personal air pollution exposure (i.e., population-level ambient air pollution estimates)”.

In addition, Alejandra explained how to choose the correct spatial and temporal scale for addressing research questions:

"Choosing the right spatial and temporal scale implies defining clear research objectives in order to plan the data collection campaigns accordingly. Here, it is crucial to consider the various aspects that affect the phenomenon or variable being studied (e.g. environmental factors, human factors, etc.) and practical capabilities for performing data analysis in order to select the appropriate scales. Another key point is to consider the environmental conditions for planning data collection, including timing, repetitions, accessibility, and limitations that may affect data collection".

3. Communication and Interpretation

Effectively conveying air quality data to diverse audiences poses a significant challenge, given stakeholders' varying levels of expertise.

Visualizations should cater to the target audience, employing appropriate metaphors and intuitive interfaces. Color scales, legends, and tooltips offer contextual information and interpretation guidance. Interactive visualizations empower users to delve into data and access additional details.

Sotirios Papathanasiou, the owner of “See the Air” blog and a WELL Building Institute Air Advisor, explained how to ensure that your visualizations are accessible and understandable to a broad audience, including non-experts:

“It depends on the audience, but a nice and clean visualization with colors clearly describing what you are trying to communicate is essential. For example, everyone can understand a heat map, so if you combine a heat map and a calendar, even non-experts will understand that specific dates throughout the year are experiencing air pollution for X,Y, and Z reasons. That way, next time you will be prepared to mitigate or avoid air pollution events”.

Alejandra Lizama elaborated on the specific techniques used for creating accurate and informative visualizations from large air quality datasets:

"To create accurate and informative visualizations we employ several advanced techniques. We start with data preprocessing, which involves cleaning and standardizing the data to ensure accuracy and consistency. This is followed by spatial and temporal analysis, where we aggregate data at appropriate spatial levels such as street segments or neighborhoods, and analyze it over relevant time periods to capture variations and trends. We also envisage using statistical modeling and machine learning algorithms to identify patterns and possibly predict future air quality scenarios. Geospatial tools and software are utilized to map the data, allowing us to visualize spatial distributions and identify hotspots. Interactive dashboards could also be created to enable users to explore the data dynamically, providing a comprehensive and user-friendly way to understand complex air quality information."

4. Inconsistent Date and Time

According to Sotirios, there is another unexpected yet significant challenge you may encounter when aggregating air quality data from various sources: dates.

“The most common problem, believe it or not, is dates. For some inexplicable reason, air quality manufacturers don’t use a standard for time and date information. The standard is ISO 8601, and everyone should agree to follow it. It doesn’t matter if you come from the US, Asia, or Europe. Here is an example: 2024-07-10 15:00:00, which represents the 10th of July 2024 at 3 p.m. Sometimes, I have to merge two columns together: Time and Date. With the help of spreadsheet software, I will create a new column with both data columns fused. I prefer using R Language as it allows me to do the same thing over and over again with just a few commands. Lubridate is an R package that allows us to establish the ISO 8601 format in our data”.

‍

Benefits of Visualizations in Air Quality Research

As previously discussed, air quality data often encompasses numerous variables, such as pollutant concentrations, meteorological factors, and geographical locations. Visualizations simplify complex data sets and identify critical patterns and relationships using charts, graphs, maps, and interactive dashboards. For instance, this data can reveal elevated NO2 levels during traffic jams or the distance PM2.5 levels travel from the primary pollution source.

Furthermore, visualizations assist in analyzing air quality data by uncovering trends, patterns, and anomalies. Time series plots and trend lines illustrate long-term changes, while heatmaps and contour plots spotlight pollution hotspots and sources. These visual cues facilitate the identification of areas of concern and guide interventions.

‍

Air Quality Visualization Techniques

‍

Data Visualization Types

Selecting appropriate data visualization types is crucial for effectively presenting air quality data. Different types apply to different visualization goals. For example, your visualization types will vary if you want to show spatial air quality data for a specific territory, compared to if your goal is to show how air pollution changes over time in the same location. Gabe Fosse, a Data Analyst at OpenAQ, has identified several suitable visualization techniques:

“Geo-spatial heat maps offer an intuitive portrayal of pollution distribution across different regions. Line graphs, bar plots or point scatters are commonly utilized for temporal data, illustrating trends and changes over time. Additionally, scatter plots can aid in identifying correlations between various pollutants or environmental factors”.

Alejandra Lizama provided examples of specific types of data visualizations particularly suitable for representing air quality data:

Dynamic Density Maps, which show hourly variations in air quality over a typical day, hence highlighting how pollution levels fluctuate due to human activities like traffic and industrial operations.
Heat Maps, which represent the intensity of air pollution across different areas and are particularly useful for identifying hotspots and areas with high pollution concentrations;
Temporal Analysis Charts, which help visualize trends and patterns in air quality over extended periods, thus revealing how air quality changes over days, weeks, or months.
Spatial Aggregation at the street segment level, which provides insights into how different street types, influenced by factors such as traffic volume and green space, exhibit varying pollutant concentrations.

Data visualization types A, B, C and E from the picture below are examples of temporal data, while D and F are examples of spatial data.

‍

From EPA’s Community Air Monitoring Fundamentals Webinar Series

Spatial data visualization for air pollution varies based on analysis and presentation needs.

For example, heatmaps are used to show pollution intensity, highlighting hotspots.

Map of trends in smoke exposure across United States — From Burke et al. 2021

Point maps use points to represent measurement locations, with sizes or colors indicating pollution levels.

Different symbols or icons in symbol maps could also be placed on specific locations, varying in size or color to indicate different pollution levels or pollutants.

Choropleth maps use shades or colored polygons to fill administrative areas based on air quality data, showing spatial variations.

When mapping with polygons, two primary zoning types exist: regular and irregular. Regular zones maintain consistent shape and size across space, whereas irregular zones exhibit variability in shape and size.

From Hexagons for Location Intelligence: Why, When & How? by Helen McKenzie

Irregular zones can present complexities and challenges for data scientists, whereas regular zones facilitate more precise data collection. Moreover, irregular zones may introduce biases in data boundaries and perceptual distortions in cartography.

Hexagonal zones represent a regular type that confers several advantages over other shapes. They tessellate to form a contiguous grid with consistent spatial relationships, making them well-suited for depicting curves and gradual spatial transitions.

Color Schemes

Color plays a significant role in accurately and intuitively representing air quality levels. While governmental organizations and non-profits commonly select color schemes based on air quality index values or pollutant concentrations to ensure clarity for the general public, researchers often prefer using gradients. Alexander Dangel explained:

“There is a growing consensus that the US EPA's air pollution categories and color scheme are best practices for communicating air quality with the general public. However, this does not apply to scientific visualizations. In our research, we aim to use gradients that clearly distinguish outcome differences. The graphic below also illustrates various spatial scales effectively”.

‍

pollution concentration map — From Fowlie et al. 2018

Gabe Fosse also emphasized the importance of selecting appropriate color schemes or gradients, stating that it is crucial for accurately conveying underlying data.

“It is typically recommended to use colorblind-friendly palettes and maintain consistent colors, such as warmer colors for higher values and cooler colors for lower ones. Gradients should be chosen to highlight critical thresholds. For air quality data, many prefer "sequential" color schemes progressing from light to dark or transitioning from one color to another. When representing more complex data, "diverging" color schemes can be employed, incorporating two contrasting colors to indicate values above and below a midpoint.”

‍

Tools for Data Visualization

‍

Excel/Google Sheets

Spreadsheets are excellent for simple analysis, including time series, comparison plots, and correlations. They can also serve as a basic data management system (DMS) for ingesting and processing data. Moreover, they can handle smaller datasets, such as those from 1 to 2 sites.

Python and R

Python and R are both free, open-source programming languages that support statistical computing and graphics.

They boast large user communities that encourage code sharing. Before choosing between them, consider factors such as your team's programming experience (taking into account potential learning curves), and the availability of resources and expertise throughout the project. Useful resources include Openair, an R package for air quality analysis; AirSensor, an open-source R package; DataViewer; EPA Sensortoolkit for air sensor data analysis, and various Python code libraries for evaluating air sensor data.

Sotirios Papathanasiou prefers using R for his air quality analyses over other data visualization tools:

“Unfortunately, spreadsheet software cannot handle large data frames. As a result, R comes in handy. Some people use Python as well but I prefer R on RStudio. On RStudio and with the help of various well-written packages like openair and ggplot2 you can create and customize the visualizations. From Air Quality Calendars to Simple Plots, and Trend Plots to Time Variation Plots you can create high-quality visualizations that will help you and others understand air pollution events and trends”.

To learn more about air quality data visualization from different sources using RStudio, check out a Sotirios’s detailed article.

Tableau

Tableau offers a user-friendly interface with robust capabilities. It seamlessly integrates with numerous data sources and provides a range of visualization options, from charts to maps.

With products ranging from desktop to server and web-based versions, Tableau also offers CRM solutions. Tableau Public is a free option suitable for learning the software, although any visualizations created are accessible to everyone, so it's not recommended for proprietary or sensitive data.

This technical instruction provides a step-by-step guide on how to visualize air quality data collected by Atmotube PRO portable air quality monitors. However, it can also be used as a general guideline for visualizing data from a CSV file.

A CSV file from Atmotube PRO contains the following air quality parameters that can be visualized using Tableau Public.

AcrGIS

Utilizing various visualization methods, ArcGIS aids in data exploration, analysis interpretation, and result communication. Whether through maps, charts, or 3D scenes, it facilitates comparison, distribution visualization, relationship exploration, geospatial data management, spatial analysis, and map production.

QGIS

QGIS is a powerful, free, and open-source geographic information system (GIS) software for creating, visualizing, analyzing, and managing geospatial data. It allows users to create maps, edit spatial data, and perform various spatial analysis tasks.

‍

Examples of Air Quality Data Visualizations

‍

Community-Engaged Study to Assess the Spatial Distribution of PM2.5 Concentrations across Disadvantaged Communities, Santa Ana, CA

GREEN-MPNA and UCI utilized Atmotube PRO to monitor air quality in Santa Ana, CA, with funding from the California Air Resources Board. The project aimed to raise awareness of environmental justice concerns in the region, investigate the connection between socioeconomic factors and air quality, and discern whether air pollution stemmed primarily from industrial sources or traffic.

In this study, two air quality data visualization techniques were employed: mapping of air pollution hotspots and sources, and boxplots.

The figure below presents boxplots of PM2.5 concentrations averaged across each time period (T1 = morning, T2 = midday, and T3 = evening) alongside relevant urban features for measurements collected on the sampling day in February. The symbol “X” represents average concentrations.

‍

From Community-Engaged Use of Low-Cost Sensors to Assess the Spatial Distribution of PM2.5 Concentrations across Disadvantaged Communities: Results from a Pilot Study in Santa Ana, CA

‍

The subsequent figure illustrates one-minute average PM2.5 measurements projected across the Focus Area in Santa Ana using high-resolution GPS tracking devices. Red dots indicate PM2.5 levels exceeding 12 μg/m3.

‍

Satellite Data on Wildfire Location

Another example is the US EPA's Fire and Smoke Map, which overlays satellite data on wildfire locations, smoke transport modeling, and public and private air quality monitor readings.

The Fire and Smoke Map, built using Esri’s ArcGIS software, offers real-time updates on air quality and fire activity.

It gathers data from low-cost air quality sensors, filters out inconsistencies, and calculates a single Air Quality Index (AQI) score over time using the EPA's NowCast formula. Flame icons highlight areas with reported fires, drawing from the US National Interagency Fire Center and satellite detection.

The Fire and Smoke Map provides information on current air quality, fire locations, smoke plumes, and recommendations based on your location to help you protect yourself from smoke.

The icons on the map are clickable:

Special Smoke outlook map — From AirNow Fire and Smoke Map

‍

Innovative Approaches and Emerging Trends in Air Quality Data Visualizations

Alexander Dangel outlined several interesting approaches to air quality data visualizations:

Integrating satellite measurements into data visualization

Satellites equipped with remote sensing technology can measure atmospheric composition and pollutant concentrations on a large scale. This data helps researchers and policymakers understand regional and global air pollution patterns, and evaluate the effectiveness of mitigation strategies.

An example of this approach is the Air Quality API by Google Maps Platform, which combines information from multiple data sources, including satellites. This method ensures continuous data collection even if one source becomes unavailable.

Overlaying different information onto pollution maps (e.g., smoke and fire, pollution sources, wind direction, etc.)

This approach enhances the understanding of air pollution by integrating additional relevant data layers. For example, by overlaying smoke and fire information onto a pollution map, users can identify areas with elevated pollution levels due to wildfires or industrial emissions. Similarly, overlaying pollution sources such as factories, power plants, or traffic hotspots can provide insights into the origins of pollution.

Combining private (citizen) air quality measurements with data from government monitors

These solutions allow individuals or communities to use personal monitoring devices or smartphone applications to contribute their air quality measurements. The collected data can be combined with the data from government monitoring stations to create a comprehensive view of air quality in a given area. Visualizations can represent both the official government data and the citizen-contributed data on the same map or graph, enabling a direct comparison.

Alejandra Lizama agreed with Alexander's ideas and highlighted the use of portable sensors:

"Emerging technologies like ground-based and portable sensors are set to greatly improve how we measure pollutant concentration. As these sensors become more widespread, they provide more detailed and comprehensive information about air quality across different territories. This results in an improved spatial and temporal resolution, offering a more complete and nuanced view of how air quality patterns change according to particular environmental conditions. ‍

According to Alejandra, digital twin technology also emerges as another method in data visualization. Here are her thoughts on this:

"Digital twin technology is revolutionizing environmental modeling and analysis. Digital twins enable real-time simulation of air quality, allowing for detailed visualization and scenario analysis. For example, MindEarth is applying this technology to investigate the intricate relationship between traffic, mobility, and air quality. This focus addresses a critical gap in understanding the interplay between these variables, thereby supporting more informed decision-making and the development of resilient urban environments."‍

Gabe identified interactive dashboards and real-time data feeds as another innovative approach.

***

If you are interested in air quality research, check out these projects showcasing different applications of air quality data.‍