Remote sensing images are great sources for ecological researches. They are especially useful for natural resoures and a variety of socio-economic researches and applications. This section will introduce how to manipluate these data.
This section is adopted from the module. we will primarily work with the Vegetation Continuous Fields (VCF) data provided by the Land Processes Distributed Active Archive Center (LP DAAC), a component of NASA’s Earth Observing System Data and Information System (EOSDIS). The MOD44B Version 6 VCF is a yearly representation of surface vegetation from 2000 to 2020 at 250 m resolution. Each pixel stores a percentage of three ground cover components: percent tree cover, percent non-tree cover, and percent bare.
The ground cover percentages are estimates from a machine learning model based on the combination of the Moderate Resolution Imaging Spectroradiometer (MODIS) data and other high resolution data from NASA and Google Earth. The machine learning model incorporates the visible bandwidth as well as other bandwidth such as brightness temperature (from MODIS bands 20, 31, 32).
The VCF data utilize thermal signatures and other correlates to distinguish forest and non-forest plantation, which is an improvement compared to the Normalized Differenced Vegetation Index (NDVI). For this use case, VCF also improves on the Global Forest Cover (GFC) data set, another source used to study deforestation, which only provides binary data points. GFC records baseline forest cover in the year 2000 and includes a binary indicator for the year of deforestation for each 30m × 30m pixel. If over 90% of the forest cover was lost in a pixel by a given year, then the pixel is marked as deforested, while a pixel is marked as reforested if the forest cover went from 0 in 2000 to a positive value. The VCF records continuous changes in percent of ground cover components, which provides more details than the GFC data.
There is a strong correlation between nighttime lights and Gross State Product (GSP) or Gross Domestic Product (GDP) measures, at the national, state and regional levels or even at a more granular resolution. Thus, nighttime light observations can be used as a proxy for economic activity, especially over periods or regions where these data are not available or where the statistical systems are of low quality or when no recent population or economic censuses are available. Similarly, changes in nighttime light intensity can be used by economists as an additional measure of income growth when no other measures of income growth are available.
Proville et al. (2017) examined trends observed by DMSP-OLS in the period 1992-2013 and their correlation with a series of socio-economic indicators. They found the strongest correlations between nighttime lights, electricity consumption, CO2 emissions, and GDP, followed by population, CH4 emissions, N2O emissions, poverty and F-gas emissions.
In order to perform data manipulation, we need to attach packages. We are going to use the package luna to download data from MODIS and the packages terra, tidyverse, raster, and sf for data manipulation.
We follow thistutorial to get MODIS data with luna. For details of the terra package, please refer to the package manuscript and this tutorial. If you are not familiar with the tidyverse workflow, please refer to the R for Data Science.
Once the required packages have been attached, we can access VCF in R. We prefer using R for its ability to download large numbers of files and enable regular, automated updates.
We can first use luna to check the list of data products available from MODIS. Since luna can also access data from the LANDSAT and SENTINEL platforms, we add “MOD|MYD|^MCD” to narrow our scope to MODIS data. The printed results below list six products from MODIS.
The product name for VCF is MOD44B. We can use the function productInfo to launch the information page of VCF.
We can query MODIS and only download a subset of the data. We need to specify the start and end dates and our area of interest (AOI). The date format is “yyyy-mm-dd”. Suppose here we want to subset data from 2010 to 2012.
In order to subset your area of interest, you need to provide a “map” to getModis(). This can be obtained from online databases such as the global administrative area database (GADM). You can download map data directly from GADM or you can use R to obtain GADM map data. We will use R below, which requires first installing the package geodata.
Geographic levels in GADM are defined as:
level 0: National level 1: State/province/equivalent level 2: County/district/equivalent level 3/4: Smaller administrative levels For our example, we are interested in India at the district level. We can download the map of India and its level 2 administrative areas with the following code:
The boundary data is downloaded to the path that you specified in the path argument. The downloaded data through gadm() will be in the PackedSpatVector class. If you want to convert it to another class (for example, the sf class, which is easier to work with in R), you can first read it using readRDS(), then convert to a SpatVector via vect() from the terra package, and finally convert it to a sf object.
The map we downloaded is at the district level (level 2). Assume our AOI is the state of Odisha. Each row of the data represents a county in Odisha, and the geospatial information for each county is stored in the last column: geometry. We can filter to obtain the boundaries for our AOI, which will return aoi in vector format, stored as a data frame in R.
Now that we have our AOI as well as time frame, we can filter the MODIS VCF data on these values and see what is available.
The products we are going to download are tiled products. For details of tiled products, the tilling system, and the naming convention, please refer to the MODIS overview page. In essence, we will be downloading grids of maps that cover our AOI.
To actually download these files from the NASA server, you will need a username and password. Please register on NASA Earth Data if you haven’t done so.
The following code will download the files. Replace the path value with the location on your computer where you would like to store these files. Replace the username and password values with your NASA Earth Data credentials from above.
The data format from MODIS is HDF and may include sub-datasets. We can use terra to read these files and create raster files. For example,
We can find basic information such as the coordinate reference system, number of cells, and resolution from the above output. There are 7 layers in each of the VCF tiled files. We are interested in the percent tree coverage layer.
A quick plot of the data can be done with the plotRBG() function.
Since there are four hdf files in each year for our AOI, we can first merge the four SpatRaster files into one file per year. We’ll use 2010 as an example. We can filter to only include our layer of interest - percent of tree cover - from each hdf file, which can be done by subsetting the output using [[1]] (using 1 because percent tree cover is the first layer in each file).
Before we merge these SpatRster objects, it is often a good practice to check their origins and resolutions. merge requires origin and resolution to be the same across objects.
We see that origins of these files are slightly different, but all are close to (0, 0). We do not need to worry about these slight differences, as merge will handle them automatically.
Note: cells with 200% represent water and rivers.
We are now ready to crop and mask the raster file to match our AOI. This tutorial explains the difference between cropping and masking.
To crop a raster file according to vector data boundaries (eg, our aoi object representing Odisha districts), we first align the coordinate reference systems of our raster file and vector file. Then, use crop(raster data, vector data). To mask, use mask(raster data, vector data). Note that for terra::mask(), the second argument needs to be SpatVector. terra does not support sf objects yet, so we use vect(aoi) to convert our sf object aoi to a SpatVector.
To plot our new raster file, we use:
After we have cropped and masked the raster file to our AOI, we can extract values for each county in the state of Odisha.
The values extracted by terra::extract are stored in a data frame. Note that the ID corresponds to the row number of your vector file (i.e. object aoi in our case). We can then compute statistics based on this data frame. Here we compute several statistics describing the percent of forest cover for each county. Note that cells with 200% represent water and river and should be excluded from calculation.
With terra you can easily write shape files and several formats of raster files. The main function for writing vector data is writeVector(), while for writing raster data we use writeRaster(). For details, you can refer to this page and the documentation of terra.
We will replicate some main results in the paper. To access the full replication data and code, check this github repo. We are going to replicate Table 3 in the paper.
The research question is whether newly constructed rural roads impact local deforestation. The authors explored this question using two empirical strategies: fuzzy RD and difference-in-difference. In the following sections, we implement the difference-in-difference method and replicate the regression results.
In order to run fixed effects models, we will need the fixest package. This tutorial is a good reference for introducing fixest functions.
Data for this exercise was processed and stored in pmgsy_trees_rural_panel.csv, which you can find the through the link to the CSV data in the github repo. Each row of the data frame presents a village in a specific year.
The paper estimated the following equation:
\[ Forest_{vdt} = β_{1}Award_{vdt} + β_{2}Complete_{vdt} + α_{v} + γ_{dt} + X_{v}⋅V_{t} + η_{vdt} \]
where \(Forest_{vdt}\) is forest cover of village \(v\) in district \(d\) in year \(t\). \(Award_{vdt}\) is a dummy variable which takes one during the period when the new road is awarded to the village but has not been built. \(Complete_{vdt}\) is also a dummy variable which takes one for all years following the completion of a new road to village \(v\). \(α_{v}\) are village fixed effects, while \(γ_{dt}\) are the district-year fixed effects. \(X_{v}\) controls some baseline characteristics (e.g. forest cover in 2000, total population) and is interacted with year fixed effects \(V_{t}\).
There is one more step before we run the regressions. In Stata, which the authors used for their regression, reghdfe removed singleton groups automatically. However, the fixest package currently doesn’t possess this functionality, so for now, we will manually remove these observations.
Finally, we can run our regressions. Following the authors, we test the effect of being awarded a new road and receiving the road on the log forest cover as well as on the average forest cover.
Our results align with the authors’ findings presented in Table 3 which show that being awarded a road has a negative impact on forest cover (approximately 0.5% loss in the construction period between being awarded a road and its completion), but after the road is constructed, forest cover appears to return. This could incorrectly be interpreted as a positive effect of roads on tree cover if the award term is left out. This determination that rural roads have no effect on forest loss, in combination with the authors’ additional findings of substantial forest loss due to highway construction, have important policy implications for governments considering similar infrastructure expansion. The use of VCF data in this study enabled significant insights, and the potential use cases for VCF data remain numerous.