Texas Outage

Analyzing the Effects of the Texas February 2021 Storm on the Houston metropolitan Area
MEDS
Spatial
R
Author
Affiliation

Javier Patrón

Published

October 26, 2022

Overview

“In February 2021, the state of Texas suffered a major power crisis, which came about as a result of three severe winter storms sweeping across the United States on February 10–11, 13–17, and 15–20.”1 For more background, check out these engineering and political perspectives.

The tasks in this post are: - Estimating the number of homes in Houston that lost power as a result of the first two storms
- Investigating if socioeconomic factors are predictors of communities recovery from a power outage

The analysis will be based on remotely-sensed night lights data, acquired from the Visible Infrared Imaging Radiometer Suite (VIIRS) onboard the Suomi satellite. In particular, you will use the VNP46A1 to detect differences in night lights before and after the storm to identify areas that lost electric power.

To determine the number of homes that lost power, you link (spatially join) these areas with OpenStreetMap data on buildings and roads.

To investigate potential socioeconomic factors that influenced recovery, you will link your analysis with data from the US Census Bureau.

Spatial Skills:
  • Load vector/raster data
  • Simple raster operations
  • Simple vector operations
  • Spatial joins

Data

Night lights

Use NASA’s Worldview to explore the data around the day of the storm. There are several days with too much cloud cover to be useful, but 2021-02-07 and 2021-02-16 provide two clear, contrasting images to visualize the extent of the power outage in Texas.

VIIRS data is distributed through NASA’s Level-1 and Atmospheric Archive & Distribution System Distributed Active Archive Center (LAADS DAAC). Many NASA Earth data products are distributed in 10x10 degree tiles in sinusoidal equal-area projection. Tiles are identified by their horizontal and vertical position in the grid. Houston lies on the border of tiles h08v05 and h08v06. We therefore need to download two tiles per date.

Roads

Typically highways account for a large portion of the night lights observable from space (see Google’s Earth at Night). To minimize falsely identifying areas with reduced traffic as areas without power, we will ignore areas near highways.

OpenStreetMap (OSM) is a collaborative project which creates publicly available geographic data of the world. Ingesting this data into a database where it can be subsetted and processed is a large undertaking. Fortunately, third party companies redistribute OSM data. We used Geofabrik’s download sites to retrieve a shape file of all highways in Texas and prepared a Geopackage (.gpkg file) containing just the subset of roads that intersect the Houston metropolitan area. 

  • gis_osm_roads_free_1.gpkg

Houses

We can also obtain building data from OpenStreetMap. We again downloaded from Geofabrick and prepared a GeoPackage containing only houses in the Houston metropolitan area.

  • gis_osm_buildings_a_free_1.gpkg

Socioeconomic

We cannot readily get socioeconomic information for every home, so instead we obtained data from the U.S. Census Bureau’s American Community Survey for census tracts in 2019. The folder ACS_2019_5YR_TRACT_48.gdb is an ArcGIS “file geodatabase”, a multi-file proprietary format that’s roughly analogous to a GeoPackage file.

The geodatabase contains a layer holding the geometry information, separate from the layers holding the ACS attributes. You have to combine the geometry with the attributes to get a feature layer that sf can use.

Assignment

Find locations of blackouts

For improved computational efficiency and easier interoperability with sf, we will use the stars package for raster handling.

Code
# Setting my filepaths
rootdir <- ("/Users/javipatron/Documents/MEDS/Courses/eds223")
datatif <- file.path(rootdir,"data","VNP46A1")
data <- file.path(rootdir,"data")

#Creating the names for each file
nightlight1 <- 'VNP46A1.A2021038.h08v05.001.2021039064328.tif' 
nightlight2 <- 'VNP46A1.A2021038.h08v06.001.2021039064329.tif' 
nightlight3 <- 'VNP46A1.A2021047.h08v05.001.2021048091106.tif'
nightlight4 <- 'VNP46A1.A2021047.h08v06.001.2021048091105.tif'
  
#Downloading the raster data to a star object
one <- read_stars(file.path(datatif, nightlight1))
two <- read_stars(file.path(datatif, nightlight2))
three <- read_stars(file.path(datatif,nightlight3))
four <- read_stars(file.path(datatif,nightlight4))
Code
#Combine tiles to have the full size
lights_off <- st_mosaic(one,two)
lights_on <- st_mosaic(three,four)

plot(lights_off, main= "Satellite Image of Houston Feb 7th")
downsample set to 6

Code
plot(lights_on, main= "Satellite Image of Houston Feb 16th")
downsample set to 6

Create a blackout mask

  • Find the change in night lights intensity (presumably) caused by the storm
    Response: The change on the mean from the second date (All lights) goes from 13.86 down 12.13939, on the first date (Outrage)
  • Reclassify the difference raster, assuming that any location that experienced a drop of more than 200 nW cm-2sr-1 experienced a blackout
  • Assign NA to all locations that experienced a drop of less than 200 nW cm-2sr-1
Code
#Create a new raster that has the difference of the values from the Feb 7th (Lights Off) raster, and the 16th (Lights On) raster. This will just have the difference of the attribute value on each pixel

raster_diff <-  lights_off - lights_on

plot(raster_diff, main= "Difference in light intensity from Feb 7th and Feb 16th")
downsample set to 6

Vectorize the mask

  • Using st_as_sf() to vectorize the blackout mask
  • Fixing any invalid geometries by using st_make_valid
Code
# Converts the non-spatial star object (.tif) file to an sf object. An sf object will have an organizing structure that will have the geom column and the layer as attribues
blackout <- st_as_sf(raster_diff)
summary(blackout)
 VNP46A1.A2021038.h08v05.001.2021039064328.tif          geometry    
 Min.   :  200.0                               POLYGON      :24950  
 1st Qu.:  262.0                               epsg:4326    :    0  
 Median :  359.0                               +proj=long...:    0  
 Mean   :  543.8                                                    
 3rd Qu.:  563.0                                                    
 Max.   :65512.0                                                    

Crop the vectorized map to our region of interest.

  • Define the Houston metropolitan area with the following coordinates
    • (-96.5, 29), (-96.5, 30.5), (-94.5, 30.5), (-94.5, 29)
  • Turn these coordinates into a polygon using st_polygon
  • Convert the polygon into a simple feature collection using st_sfc() and assign a CRS
    • hint: because we are using this polygon to crop the night lights data it needs the same CRS
  • Crop (spatially subset) the blackout mask to our region of interest 
  • Re-project the cropped blackout dataset to EPSG:3083 (NAD83 / Texas Centric Albers Equal Area)
Code
#Create vectors for the desired CRS
lon = c(-96.5, -96.5, -94.5, -94.5,-96.5)
lat = c(29, 30.5, 30.5, 29, 29)


#Create an array or matrix with those vectors 
coordinates_array <-cbind(lon,lat)

#Creating a polygon with the st_polygon function but the coordinates_array has to be in the form of a list so the st_polygon can read it.
houston_polygon <- st_polygon(list(coordinates_array))

# Create a simple feature geometry list column, and add coordinate reference system so you can "speak" the same language than your recent blackout object
houston_geom <- st_sfc(st_polygon(list(coordinates_array)), crs = 4326)

# Indexing or Cropping the blackout sf object with just the houston geometery polygon. 

houston_blackout_subset <- blackout[houston_geom,]

# Re-project the cropped blackout dataset with a new CRS (EPSG:3083) (NAD83 / Texas Centric Albers Equal Area)
houston_projection <- st_transform(houston_blackout_subset,"EPSG:3083")
Code
plot(houston_blackout_subset, main = "Blackout pixels in Houston")

Exclude highways from blackout mask

The roads geopackage includes data on roads other than highways. However, we can avoid reading in data we don’t need by taking advantage of st_read’s ability to subset using a SQL query.

  • Define SQL query
  • Load just highway data from geopackage using st_read
  • Reproject data to EPSG:3083
  • Identify areas within 200m of all highways using st_buffer
  • Find areas that experienced blackouts that are further than 200m from a highway
Code
#Reading the data with format .gpkg
query <- "SELECT * FROM gis_osm_roads_free_1 WHERE fclass='motorway'"
highways <- st_read(file.path(data, "gis_osm_roads_free_1.gpkg"), query = query)
Reading query `SELECT * FROM gis_osm_roads_free_1 WHERE fclass='motorway''
from data source `/Users/javipatron/Documents/MEDS/Courses/eds223/data/gis_osm_roads_free_1.gpkg' 
  using driver `GPKG'
Simple feature collection with 6085 features and 10 fields
Geometry type: LINESTRING
Dimension:     XY
Bounding box:  xmin: -96.50429 ymin: 29.00174 xmax: -94.39619 ymax: 30.50886
Geodetic CRS:  WGS 84
Code
#Transform that new sf object with the CRS we have been using, so we stay in the same space.
highways_3083 <- st_transform(highways, "EPSG:3083")
Code
#Create a buffer that contains all the roads, including the 200 meters across the roads.
# BUT, to make a buffer object we need to first create a union of all those rows (highways), so we can use the st_buffer function...
highways_union <- st_union(highways_3083)
highways_buffer <- st_buffer(x = highways_union,
          dist = 200)

plot(highways_buffer, main = "Highways in Houston")

Code
# Use the buffer to subtract those entire pixels or "rows" from the houston projection sf object.
houston_out <- houston_projection[highways_buffer, op = st_disjoint]
plot(houston_out,
     main= "Houston Without the Roadlights")

Find homes impacted by blackouts

  • Load buildings dataset using st_read and the following SQL query to select only residential buildings
  • hint: reproject data to EPSG:3083
Code
#Read the data
query2 <- "SELECT * FROM gis_osm_buildings_a_free_1 WHERE (type IS NULL AND name IS NULL) OR type in ('residential', 'apartments', 'house', 'static_caravan', 'detached')"

buildings <- st_read(file.path(data, "gis_osm_buildings_a_free_1.gpkg"), query = query2)
Reading query `SELECT * FROM gis_osm_buildings_a_free_1 WHERE (type IS NULL AND name IS NULL) OR type in ('residential', 'apartments', 'house', 'static_caravan', 'detached')'
from data source `/Users/javipatron/Documents/MEDS/Courses/eds223/data/gis_osm_buildings_a_free_1.gpkg' 
  using driver `GPKG'
Simple feature collection with 475941 features and 5 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -96.50055 ymin: 29.00344 xmax: -94.53285 ymax: 30.50393
Geodetic CRS:  WGS 84

Find homes in blackout areas

  • Filter to homes within blackout areas.

    Tip: You can use [ ] option to subtract those geometries from the building sf object that do not correspond to the Houston area. Or you can use the st_filter.

  • Count number of impacted homes.

    Tip: The total number of impacted houses when using st_filter or [ ] is 139,148. This could vary depending on how the st_filter considers the limits and borders of the geometries. The default method when using st_filter is st_intersects.

Code
#Count number of houses
dim(homes_blackout_join)[1]
[1] 139148

Investigate socioeconomic factors

Load ACS data

  • Use st_read() to load the geodatabase layers
  • Geometries are stored in the ACS_2019_5YR_TRACT_48_TEXAS layer
  • Income data is stored in the X19_INCOME layer
  • Select the median income field B19013e1
  • hint: reproject data to EPSG:3083
Code
#Read the data and understand the data layers
st_layers(file.path(data, "ACS_2019_5YR_TRACT_48_TEXAS.gdb"))
Driver: OpenFileGDB 
Available layers:
                            layer_name geometry_type features fields crs_name
1                      X01_AGE_AND_SEX            NA     5265    719     <NA>
2                             X02_RACE            NA     5265    433     <NA>
3        X03_HISPANIC_OR_LATINO_ORIGIN            NA     5265    111     <NA>
4                         X04_ANCESTRY            NA     5265    665     <NA>
5         X05_FOREIGN_BORN_CITIZENSHIP            NA     5265   1765     <NA>
6                   X06_PLACE_OF_BIRTH            NA     5265   1221     <NA>
7                        X07_MIGRATION            NA     5265   1793     <NA>
8                        X08_COMMUTING            NA     5265   2541     <NA>
9  X09_CHILDREN_HOUSEHOLD_RELATIONSHIP            NA     5265    263     <NA>
10      X10_GRANDPARENTS_GRANDCHILDREN            NA     5265    373     <NA>
11    X11_HOUSEHOLD_FAMILY_SUBFAMILIES            NA     5265    781     <NA>
12      X12_MARITAL_STATUS_AND_HISTORY            NA     5265    759     <NA>
13                       X13_FERTILITY            NA     5265    399     <NA>
14               X14_SCHOOL_ENROLLMENT            NA     5265    779     <NA>
15          X15_EDUCATIONAL_ATTAINMENT            NA     5265    715     <NA>
16         X16_LANGUAGE_SPOKEN_AT_HOME            NA     5265    871     <NA>
17                         X17_POVERTY            NA     5265   3941     <NA>
18                      X18_DISABILITY            NA     5265    893     <NA>
19                          X19_INCOME            NA     5265   3045     <NA>
20                        X20_EARNINGS            NA     5265   2185     <NA>
21                  X21_VETERAN_STATUS            NA     5265    565     <NA>
22                     X22_FOOD_STAMPS            NA     5265    243     <NA>
23               X23_EMPLOYMENT_STATUS            NA     5265   1625     <NA>
24         X25_HOUSING_CHARACTERISTICS            NA     5265   4415     <NA>
25                X27_HEALTH_INSURANCE            NA     5265   1593     <NA>
26       X28_COMPUTER_AND_INTERNET_USE            NA     5265    385     <NA>
27           X29_VOTING_AGE_POPULATION            NA     5265     35     <NA>
28                      X99_IMPUTATION            NA     5265    783     <NA>
29             X24_INDUSTRY_OCCUPATION            NA     5265   2107     <NA>
30                  X26_GROUP_QUARTERS            NA     5265      3     <NA>
31                 TRACT_METADATA_2019            NA    35976      2     <NA>
32         ACS_2019_5YR_TRACT_48_TEXAS Multi Polygon     5265     15    NAD83

Determine which census tracts experienced blackouts.

  • Join the income data to the census tract geometries
  • hint: make sure to join by geometry ID
  • Spatially join census tract data with buildings determined to be impacted by blackouts
  • Find which census tracts had blackouts
Code
#Create a big sf that has the income information and its geometry
income_geom <- left_join(texas_geom, texas_income, by = "GEOID_Data")

#Create an new sf that adds the income information to just the blackout sf object that we previously had. In this step I learned that if you use st_join will give you a different number and result than st_filter or [ ]

#blackout_income_join <- st_join(homes_blackout_join, income_geom)
blackout_join <- income_geom[homes_blackout_join,]

#Print those Census Tracts a had blackouts. Im printing the number and the name
length(unique(blackout_join$NAMELSAD))
[1] 711
Code
#unique(blackout_join$NAMELSAD)

Compare incomes of impacted tracts to unimpacted tracts.

  • Create a map of median income by census tract, designating which tracts had blackouts
  • Plot the distribution of income in impacted and unimpacted tracts
  • Write approx. 100 words summarizing your results and discussing any limitations to this study
Code
#Creating a list of tracts and counties that were affected by the blackout
blackout_tracts <- unique(blackout_join$TRACTCE)
blackout_counties <- unique(blackout_join$COUNTYFP)

#Creating a Data Frames that includes only the rows that have the county affected by using the geom from blackout_tracts. One with the Counties and the other one with the Tracts.
tracts_affected <- income_geom |> 
  filter(TRACTCE %in% blackout_tracts)

counties_affected <- income_geom |> 
  filter(COUNTYFP %in% blackout_counties)

#Create a map were the base is the counties of Houston, then fill the color with the income_median column we created with "B19013e1", and then highlight the counties that were had building affected from our dataset
map <- tm_shape(counties_affected) +
  tm_fill(col = "income_median", palette = "BrBG") +
  tm_borders() +
  tm_shape(blackout_join) +
  tm_fill(col = "pink", alpha= 0.3) +
  tm_layout(legend.outside = T,
            main.title = "Median Income by Census Tract",
            frame = T,
            title = "*Affected areas in pink*") +
  tm_compass(type = "arrow", 
             position = c("left", "top")) +
  tm_scale_bar()

tmap_mode("view")
tmap mode set to interactive viewing
Code
map
Compass not supported in view mode.
Code
#Finding the difference in income between the affected tracts and unaffected.

#First we need create two sf data frames to categorize if they were impacted or not by the blackout

not_impacted <- anti_join(tracts_affected , as.data.frame(blackout_join)) |> 
  mutate(impacted = "no")
Joining with `by = join_by(STATEFP, COUNTYFP, TRACTCE, GEOID, NAME, NAMELSAD,
MTFCC, FUNCSTAT, ALAND, AWATER, INTPTLAT, INTPTLON, Shape_Length, Shape_Area,
GEOID_Data, income_median, Shape)`
Code
impacted <- blackout_join |> 
  mutate(impacted = "yes")

#Second, we need to create a new data frame were the "key" is the same column name, so we will have which tracts were impacted and which ones were not.
combination <- rbind(not_impacted, impacted) |> 
  dplyr::select(income_median, impacted)

summary(combination)
 income_median      impacted                   Shape    
 Min.   : 13886   Length:741         MULTIPOLYGON :741  
 1st Qu.: 43749   Class :character   epsg:3083    :  0  
 Median : 60414   Mode  :character   +proj=aea ...:  0  
 Mean   : 71330                                         
 3rd Qu.: 89796                                         
 Max.   :250001                                         
 NA's   :25                                             
Code
#Third, create a histogram and a box plot of that new column and analyse the income median 

ggplot(combination, aes(x = income_median, fill= impacted)) +
  geom_histogram()  +
    labs(title = "Distribution of Income",
       x = "Income Median",
       y = "Count") +
   theme(
    panel.background = element_rect(fill = "gray91"),
    panel.grid.major = element_line(colour = "grey70", size = 0.2),
    panel.grid.minor = element_line(color = "gray88"))
Warning: The `size` argument of `element_line()` is deprecated as of ggplot2 3.4.0.
ℹ Please use the `linewidth` argument instead.
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: Removed 25 rows containing non-finite values (`stat_bin()`).

Code
ggplot(combination, aes(y = income_median, fill = impacted)) +
  geom_boxplot() +
  labs(title = "Blackout Impact by Income",
       x = "Impacted",
       y = "Income Median") +
   theme(
    panel.background = element_rect(fill = "gray91"),
    panel.grid.major = element_line(colour = "grey70", size = 0.2),
    panel.grid.minor = element_line(color = "gray88"))
Warning: Removed 25 rows containing non-finite values (`stat_boxplot()`).

Code
# Statistical comparison
summary(impacted$income_median)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
  13886   43749   60642   71462   90013  250001       3 
Code
summary(not_impacted$income_median)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
  25755   42793   53442   59586   63670  134500      22 
Code
t.test(impacted$income_median, not_impacted$income_median)

    Welch Two Sample t-test

data:  impacted$income_median and not_impacted$income_median
t = 0.9696, df = 7.2121, p-value = 0.3636
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -16915.65  40668.75
sample estimates:
mean of x mean of y 
 71462.30  59585.75 

Summary

The results show a high number people affected by this winter storm. According to the data, almost 140,000 houses/ building were impacted within the designated area of Houston showing an almost equal impact between high income census tracts and low income census tracts. Looking in detail at the plots, the distribution plot (Histogram), is a right skew, showing a higher density around the low income median, nevertheless in the summary results you can see that the median income for the not impacted census tracts is lower that the impacted census tract, showing that the effect of the storm affected almost equally everyone in the zone, regardless of the income. It is important to consider that we are not weighting each tract by impacted houses. Which could affect final results, and conclusions.

Footnotes

  1. Wikipedia. 2021. “2021 Texas power crisis.” Last modified October 2, 2021. https://en.wikipedia.org/wiki/2021_Texas_power_crisis.↩︎

Citation

BibTeX citation:
@online{patrón2022,
  author = {Patrón, Javier},
  title = {Texas {Outage}},
  date = {2022-10-26},
  url = {https://github.com/javipatron},
  langid = {en}
}
For attribution, please cite this work as:
Patrón, Javier. 2022. “Texas Outage.” October 26, 2022. https://github.com/javipatron.