Valuing Nature NREM 691
  • Home
  • Contingent Valuation
  • Discrete Choice
  • Hedonic
  • Travel Cost
  • Source Code
  • Report a Bug

On this page

  • Revealed Preferences: Travel Cost
    • The Travel Cost Model (TCM)
      • Understanding Consumer Surplus
  • Single Site
    • Travel Distance using OSRM
    • Count Model
      • Simple WTP
    • More Controls
      • WTP
  • Multi-Site

Travel Cost

  • Show All Code
  • Hide All Code

  • View Source
Author

LoweMackenzie

Published

September 21, 2025

Revealed Preferences: Travel Cost

The Travel Cost Method (TCM) is a commonly used approach to estimate the value of recreation sites. It was first suggested by Harold Hotelling in a 1947 letter to the National Park Service and then further defined by Marion Clawson (“Methods of Measuring the Demand for and Value of Outdoor Recreation. Marion Clawson. Resources for the Future, Inc., 1145 Nineteenth Street, N.W., Washington, D.C. 1959. 36p. 50c” 1972). It provides a lower-bound estimate of how much people value a site based on the cost they incur to visit.

There are two main types of TCM models:

1. Single-Site Models

  • Focus on one specific location.

  • Use Poisson regression to estimate how many trips people take.

  • From this, you can calculate consumer surplus the benefit people get from visiting beyond what they pay.

  • The cost of a trip includes actual expenses and the opportunity cost of travel time.

  • These models are best when you want to estimate the total value of a site. For example, if a park is closed due to pollution or budget cuts and you want to estimate the loss in value from that closure.

2. Multi-Site (or Multi-Attribute) Models

  • Focus on multiple sites or on different features (attributes) of sites.

  • Use random utility models, typically estimated with a mixed logit (random parameter logit) model.

  • These models help estimate how much people value different features, like trails, toilets, or accommodation.

  • Useful for park planning, as they help managers decide which improvements provide the most value to visitors.


The Travel Cost Model (TCM)

General Steps in the Travel Cost Modeling Process:

  1. Define the site to be evaluated

  2. Identify the types of recreation and specify the relevant season

  3. Develop a sampling strategy for data collection

  4. Specify the model, including functional form and variables

  5. Determine how to handle multi-purpose trips (i.e., trips with more than one goal)

  6. Design and conduct the visitor survey (get data from reservation system/ mobile data)

  7. Measure trip costs, including travel expenses and time value

  8. Estimate the model using the collected data

  9. Calculate the access value by estimating consumer surplus


Understanding Consumer Surplus

Consumer Surplus represents the area under the demand curve and above the actual price (travel cost) paid to access the park. This surplus reflects the net benefit or additional value visitors receive from their park experience beyond what they pay to get there. It is a commonly used metric for evaluating net recreational benefits.


Single Site

Also known as a count model. We will work thru the following example

  1. Define the site :

    We will look at a park that is potentially impacted from a road closer along highway 101. This park cod be closed because of sea level rise and position of the road.

  2. Identify the types of recreation and specify the relevant season.

    It represents campers at a campground for a popular park on the Oregon Coast. This data set is specifically looking at the consumer surplus of camping at the park that would be loss if we donʻt do anything about the road.

    Currently the only factors we know is that the road repairs will cost a lot and the repairs will impact access to the beach.

  3. Develop a sampling strategy for data collection

    For this project we will use the data from the reservation system which gives us information on every camper over the last 4 years.

  4. Specify the model, including functional form and variables

Load in the following data. You can download it here. But the following code should grab it as well.

Show code
library(readr)
park <- read_csv("http://raw.githubusercontent.com/loweas/ValuingNature/refs/heads/main/park.csv")
Rows: 33006 Columns: 9
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): sitetype
dbl (8): zcta5ce10, month, year, trip, cost, income, temp_avg, mdays

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Here are the variables

sitetype The type of site within the park

zcta5ce10 zip code

month month of reservation

year Year of reservation

trip number of trips per zip code per month

cost average cost per zip per month

income medium income for each year at the zipcode level

temp_avg average max temperature at the park in the given month

mdays the average number of days stayed at the park per zip code.

Lets take a look at the structure

Show code
hist(park$trip)

Show code
hist(park$cost)

This distribution is clearly a count variable (non negative observations). The pick is close to our near to zero which means that we should consider a count model when dealing with the data.

  1. Determine how to handle multi-purpose trips (i.e., trips with more than one goal) For this specific example we will first explore the a simple model and then how the number of days spent impacts the overall value of the trip.

  2. Design and conduct the visitor survey (get data from reservation system/ mobile data)

  3. Measure trip costs, including travel expenses and time value

    To get a distance measurement you can use online sources like OSRM. We have already calculated it for this example but I have provide a code snippet for consideration.

Travel Distance using OSRM

Show code
# --- 1. Load Libraries ---
# Install these packages if you haven't already:
# install.packages(c("tidyverse", "httr"))
library(tidyverse)
Warning: package 'ggplot2' was built under R version 4.3.3
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ purrr     1.0.2
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.2     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Show code
library(httr)

# --- 2. Configuration ---
# Public OSRM Demo Server URL for the Routing Service
OSRM_URL <- "http://router.project-osrm.org/route/v1/driving/"
BATCH_DELAY <- 0.1 # Delay between API calls (in seconds)

# --- 3. Core OSRM API Call Function ---

get_osrm_route <- function(start_lon, start_lat, end_lon, end_lat) {
  #' Calls the OSRM API to get driving distance (meters) and duration (seconds).
  #' Note: OSRM requires coordinates in Lon,Lat order (Longitude first).
  
  distance_m <- NA_real_
  duration_s <- NA_real_
  
  # Check for missing coordinates
  if (is.na(start_lon) || is.na(start_lat) || is.na(end_lon) || is.na(end_lat)) {
    return(list(Distance_m = distance_m, Duration_s = duration_s))
  }
  
  # Construct the request URL (LON,LAT;LON,LAT)
  coords <- paste0(start_lon, ",", start_lat, ";", end_lon, ",", end_lat)
  url <- paste0(OSRM_URL, coords, "?overview=false")
  
  # Make the API call
  tryCatch({
    response <- GET(url, timeout(5))
    stop_for_status(response) # Check for HTTP errors (like 400)
    data <- content(response, "parsed")
    
    # Check for success and routes
    if (data$code == 'Ok' && length(data$routes) > 0) {
      route <- data$routes[[1]]
      distance_m <- route$distance
      duration_s <- route$duration
    } else {
      # Handle OSRM internal errors (like 'NoRoute')
      message(paste("  -> OSRM API returned:", data$code, "for coordinates:", coords))
    }
  }, error = function(e) {
    # Catch connection or status errors
    message(paste("  -> OSRM Request Error:", e))
  })
  
  # Return results as a list
  return(list(Distance_m = distance_m, Duration_s = duration_s))
}

# --- 4. Generic Main Function ---

calculate_routes_from_dataframe <- function(df, start_lon_col, start_lat_col, end_lon_col, end_lat_col) {
  #' Calculates OSRM routes between specified coordinate columns in a dataframe.
  #'
  #' @param df The input dataframe.
  #' @param start_lon_col Name of the start longitude column (string).
  #' @param start_lat_col Name of the start latitude column (string).
  #' @param end_lon_col Name of the end longitude column (string).
  #' @param end_lat_col Name of the end latitude column (string).
  #' @return The original dataframe with Distance_m and Duration_s columns appended.

  cat(paste0("\nStarting generic OSRM calculations for ", nrow(df), " entries...\n"))
  
  # Check if all columns exist
  required_cols <- c(start_lon_col, start_lat_col, end_lon_col, end_lat_col)
  if (!all(required_cols %in% names(df))) {
    stop(paste("Required column(s) not found:", paste(setdiff(required_cols, names(df)), collapse = ", ")))
  }
  
  # 1. Prepare data by selecting relevant columns, ensuring numeric, and removing NAs
  df_prepared <- df %>%
    # Select only the relevant coordinate columns for the OSRM processing
    select(all_of(required_cols)) %>%
    mutate(original_index = row_number()) %>%
    # Ensure all are numeric
    mutate(across(all_of(required_cols), as.numeric)) %>%
    # Filter only rows with valid coordinates
    filter(!if_any(all_of(required_cols), is.na))
    
  total_processed <- nrow(df_prepared)
  total_skipped <- nrow(df) - total_processed

  if (total_processed == 0) {
      cat("No valid coordinate pairs found. Returning original dataframe.\n")
      # Append NA columns to the original dataframe if no routes were processed
      return(df %>% mutate(Distance_m = NA_real_, Duration_s = NA_real_))
  }

  cat(paste0("-> Found ", total_skipped, " entries with missing coordinates that will be skipped.\n"))
  cat(paste0("-> Processing ", total_processed, " valid entries now.\n"))
  
  # 2. Iterate and Call OSRM API
  
  # Vectors to store results for the rows being processed
  distance_vec <- rep(NA_real_, total_processed)
  duration_vec <- rep(NA_real_, total_processed)
  
  for (i in 1:total_processed) {
    row <- df_prepared[i, ]
    
    if (i %% 100 == 0 || i == 1 || i == total_processed) {
      cat(paste0("Processed row ", i, "/", total_processed, "\n"))
    }
    
    results <- get_osrm_route(
      row[[start_lon_col]], row[[start_lat_col]], 
      row[[end_lon_col]], row[[end_lat_col]]
    )
    
    distance_vec[i] <- results$Distance_m
    duration_vec[i] <- results$Duration_s
    
    Sys.sleep(BATCH_DELAY)
  }
  
  # 3. Merge results back to the original full dataframe
  
  df_results <- df_prepared %>%
    select(original_index) %>%
    mutate(
      Distance_m = distance_vec,
      Duration_s = duration_vec
    )
  
  # Perform a left join on the original row index
  df_final <- df %>%
    mutate(original_index = row_number()) %>%
    left_join(df_results, by = "original_index") %>%
    select(-original_index)
    
  cat("\n========================================================================\n")
  cat("Generic OSRM calculation complete. Returning augmented dataframe.\n")
  cat("========================================================================\n")
  
  return(df_final)
}

# --- 5. Example Usage ---
#
# # 1. Create a dummy dataframe:
# # my_data <- tibble(
# #   site_id = 1:3,
# #   warehouse_lon = c(-157.8, -157.9, -158.0),
# #   warehouse_lat = c(21.3, 21.4, 21.5),
# #   customer_lon = c(-157.7, -157.8, NA), # NA added to test skipping
# #   customer_lat = c(21.4, 21.5, 21.6)
# # )
# #
# # 2. Call the function, specifying your column names:
# # result_df <- calculate_routes_from_dataframe(
# #   df = my_data,
# #   start_lon_col = "warehouse_lon",
# #   start_lat_col = "warehouse_lat",
# #   end_lon_col = "customer_lon",
# #   end_lat_col = "customer_lat"
# # )
#
# # The result will have the original columns plus Distance_m and Duration_s.

Thank for use the data has the travel distance already calculated for each zip code in the dataset.

  1. Estimate the model using the collected data

We are using a zonal method to calculate the trips. Zonal travel cost means that for each location ID you are measuring the distance to park. The location ID is a zip code case.

A Poisson model (specifically Poisson regression) is a type of generalized linear model (GLM) designed to model count data. This is where the response variable represents counts of events (e.g., number of trips, visits, accidents).

This model accounts for the following:

  1. Count outcomes (non-negative integers)

    • Models outcomes like 0, 1, 2, 3...

    • Cannot produce negative predictions.

  2. The fact that the variance equals the mean

    • In a Poisson distribution, the mean (μ) is equal to the variance (σ²).

    • This is a key assumption of the model:

      \[ Var(Y)=E(Y) \]

  3. Skewed distribution

    • Count data are often right-skewed (many small values, few large ones), which the Poisson model handles better than linear regression.
  4. Log-linear relationship between predictors and expected counts

    • The model assumes a logarithmic link between the mean of the outcome and the predictors:

      \[log⁡(μ)=β0+β1X1+β2X2+…e\]

  5. Independent events

    • Each event (e.g., each person’s number of trips) is assumed to be independent of others.

Count Model

Simple cost and the decision of a trip

Show code
model1=glm(trip ~ cost, 
            data = park, 
            family = poisson())
summary(model1)

Call:
glm(formula = trip ~ cost, family = poisson(), data = park)

Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept)  1.255e+00  8.017e-03  156.59   <2e-16 ***
cost        -2.716e-03  4.525e-05  -60.01   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 51093  on 31584  degrees of freedom
Residual deviance: 47034  on 31583  degrees of freedom
  (1421 observations deleted due to missingness)
AIC: 124676

Number of Fisher Scoring iterations: 5
  1. Calculate the access value by estimating consumer surplus

Simple WTP

Show code
1/model1$coefficients[2]
     cost 
-368.2565 

The interpretation suggest that on average the consumer surplus for each person who camps at this park is on average $368.

More Controls

Multiple variable regression: Controlling for more factors in the model

Show code
model2=glm(trip ~ cost + income+factor(year)+temp_avg+mdays+factor(sitetype), 
            data = park, 
            family = poisson())

summary(model2)

Call:
glm(formula = trip ~ cost + income + factor(year) + temp_avg + 
    mdays + factor(sitetype), family = poisson(), data = park)

Coefficients:
                                    Estimate Std. Error z value Pr(>|z|)    
(Intercept)                       -1.336e+00  7.652e-02 -17.459  < 2e-16 ***
cost                              -3.308e-03  4.894e-05 -67.585  < 2e-16 ***
income                             6.063e-06  1.888e-07  32.106  < 2e-16 ***
factor(year)2019                  -3.635e-02  1.275e-02  -2.851 0.004356 ** 
factor(year)2020                   5.408e-02  1.502e-02   3.599 0.000319 ***
factor(year)2021                   5.592e-02  1.246e-02   4.489 7.17e-06 ***
factor(year)2022                   1.822e-02  1.292e-02   1.410 0.158455    
factor(year)2023                   5.927e-03  1.392e-02   0.426 0.670318    
temp_avg                           2.020e-02  5.505e-04  36.694  < 2e-16 ***
mdays                              3.685e-02  2.914e-03  12.646  < 2e-16 ***
factor(sitetype)ADA Standard Full  1.646e-02  9.708e-02   0.170 0.865350    
factor(sitetype)ADA TENT           1.324e-01  1.047e-01   1.265 0.205737    
factor(sitetype)GROUP TENT ONLY   -9.854e-02  8.400e-02  -1.173 0.240766    
factor(sitetype)HOST SITE          2.820e-01  1.917e-01   1.471 0.141265    
factor(sitetype)STANDARD           8.083e-01  6.718e-02  12.032  < 2e-16 ***
factor(sitetype)STANDARD - FULL    7.389e-01  6.729e-02  10.982  < 2e-16 ***
factor(sitetype)TENT SITE          1.146e+00  6.706e-02  17.090  < 2e-16 ***
factor(sitetype)YURT               7.164e-01  6.754e-02  10.608  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 51093  on 31584  degrees of freedom
Residual deviance: 40221  on 31567  degrees of freedom
  (1421 observations deleted due to missingness)
AIC: 117894

Number of Fisher Scoring iterations: 5

WTP

Show code
1/model2$coefficients[2]
    cost 
-302.317 

We can see with more controls our measurement reduces and is more conservative.

The consumer surplus is now around $302 per person.

From here you could take the sum of the season or year and calculate the WTP for this specific site. For example, a policy in which the park will impacted the park, funding for park, etc. This is helpful in estimating the total use value (consumer surplus) of one site.

Multi-Site

A single site has a lot of caveats, however. You canʻt say much about what the impacts are to other parks. People may just trade-off to go to a different park and look at sites as a package.

In that cause we use a multi-site approach by estimating demand systems, comparing multiple sites, and valuing changes in site characteristics or policy scenarios.

Use multi-site models when:

  • You want to evaluate relative site quality or rank sites.

  • You have data on multiple alternative sites and want to understand visitor choice behavior.

  • You’re interested in estimating marginal values of site attributes (e.g., distance, facilities, congestion).

Common models used:

  • Random Utility Models (RUMs) or

  • Nested Logit Models

References

“Methods of Measuring the Demand for and Value of Outdoor Recreation. Marion Clawson. Resources for the Future, Inc., 1145 Nineteenth Street, N.W., Washington, D.C. 1959. 36p. 50c.” 1972. Travel Research Bulletin 10 (3): 11–11. https://doi.org/10.1177/004728757201000331.
Source Code
---
title: "Travel Cost"
author: "LoweMackenzie"
date: "2025-09-21"
format:
  html:
    code-fold: true        # Enables dropdown for code
    code-tools: true       # (Optional) Adds buttons like "Show Code"
    code-summary: "Show code"  # (Optional) Custom label for dropdown
    toc: true
    toc-location: left
    page-layout: full
editor: visual
bibliography: references.bib
---

# Revealed Preferences: Travel Cost

The **Travel Cost Method (TCM)** is a commonly used approach to estimate the value of recreation sites. It was first suggested by [Harold Hotelling in a 1947 letter to the National Park Service](https://www.economia.unam.mx/profesores/blopez/valoracion-hotelling.pdf) and then further defined by Marion Clawson [@methods1972]. It provides a [**lower-bound estimate**]{.underline} of how much people value a site based on the cost they incur to visit.

There are two main types of TCM models:

#### 1. Single-Site Models

-   Focus on one specific location.

-   Use **Poisson regression** to estimate how many trips people take.

-   From this, you can calculate **consumer surplus** the benefit people get from visiting beyond what they pay.

    ![](images/Picture1.png)

-   The cost of a trip includes actual expenses and the opportunity cost of travel time.

-   These models are best when you want to estimate the total value of a site. For example, if a park is closed due to pollution or budget cuts and you want to estimate the loss in value from that closure.

#### 2. Multi-Site (or Multi-Attribute) Models

-   Focus on multiple sites or on different features (attributes) of sites.

-   Use random utility models, typically estimated with a mixed logit (random parameter logit) model.

-   These models help estimate how much people value different features, like trails, toilets, or accommodation.

-   Useful for park planning, as they help managers decide which improvements provide the most value to visitors.

------------------------------------------------------------------------

## The Travel Cost Model (TCM)

General Steps in the Travel Cost Modeling Process:

1.  **Define the site** to be evaluated

2.  **Identify the types of recreation** and specify the relevant season

3.  **Develop a sampling strategy** for data collection

4.  **Specify the model**, including functional form and variables

5.  **Determine how to handle multi-purpose trips** (i.e., trips with more than one goal)

6.  **Design and conduct the visitor survey (get data from reservation system/ mobile data)**

7.  **Measure trip costs**, including travel expenses and time value

8.  **Estimate the model** using the collected data

9.  **Calculate the access value** by estimating **consumer surplus**

------------------------------------------------------------------------

### Understanding Consumer Surplus

Consumer Surplus represents the area under the demand curve and above the actual price (travel cost) paid to access the park. This surplus reflects the net benefit or additional value visitors receive from their park experience beyond what they pay to get there. It is a commonly used metric for evaluating net recreational benefits.

------------------------------------------------------------------------

# Single Site

Also known as a count model. We will work thru the following example

1.  **Define the site** :

    We will look at a park that is potentially impacted from a road closer along highway 101. This park cod be closed because of sea level rise and position of the road.

    ![](images/Screenshot 2025-10-05 at 8.27.56 PM.png)

2.  **Identify the types of recreation** and specify the relevant season.

    It represents campers at a campground for a popular park on the Oregon Coast. This data set is specifically looking at the consumer surplus of camping at the park that would be loss if we donʻt do anything about the road.

    Currently the only factors we know is that the road repairs will cost a lot and the repairs will impact access to the beach.

3.  **Develop a sampling strategy** for data collection

    For this project we will use the data from the reservation system which gives us information on every camper over the last 4 years.

4.  **Specify the model**, including functional form and variables

Load in the following data. You can download it [here](https://github.com/loweas/ValuingNature/blob/main/park.csv). But the following code should grab it as well.

```{r}
library(readr)
park <- read_csv("http://raw.githubusercontent.com/loweas/ValuingNature/refs/heads/main/park.csv")

```

Here are the variables

`sitetype` The type of site within the park

`zcta5ce10` zip code

`month` month of reservation

`year` Year of reservation

`trip` number of trips per zip code per month

`cost` average cost per zip per month

`income` medium income for each year at the zipcode level

`temp_avg` average max temperature at the park in the given month

`mdays` the average number of days stayed at the park per zip code.

Lets take a look at the structure

```{r}
hist(park$trip)

hist(park$cost)
```

This distribution is clearly a count variable (non negative observations). The pick is close to our near to zero which means that we should consider a count model when dealing with the data.

5.  **Determine how to handle multi-purpose trips** (i.e., trips with more than one goal) For this specific example we will first explore the a simple model and then how the number of days spent impacts the overall value of the trip.

6.  **Design and conduct the visitor survey (get data from reservation system/ mobile data)**

7.  **Measure trip costs**, including travel expenses and time value

    To get a distance measurement you can use online sources like [OSRM](https://project-osrm.org/). We have already calculated it for this example but I have provide a code snippet for consideration.

### Travel Distance using OSRM

```{r}
# --- 1. Load Libraries ---
# Install these packages if you haven't already:
# install.packages(c("tidyverse", "httr"))
library(tidyverse)
library(httr)

# --- 2. Configuration ---
# Public OSRM Demo Server URL for the Routing Service
OSRM_URL <- "http://router.project-osrm.org/route/v1/driving/"
BATCH_DELAY <- 0.1 # Delay between API calls (in seconds)

# --- 3. Core OSRM API Call Function ---

get_osrm_route <- function(start_lon, start_lat, end_lon, end_lat) {
  #' Calls the OSRM API to get driving distance (meters) and duration (seconds).
  #' Note: OSRM requires coordinates in Lon,Lat order (Longitude first).
  
  distance_m <- NA_real_
  duration_s <- NA_real_
  
  # Check for missing coordinates
  if (is.na(start_lon) || is.na(start_lat) || is.na(end_lon) || is.na(end_lat)) {
    return(list(Distance_m = distance_m, Duration_s = duration_s))
  }
  
  # Construct the request URL (LON,LAT;LON,LAT)
  coords <- paste0(start_lon, ",", start_lat, ";", end_lon, ",", end_lat)
  url <- paste0(OSRM_URL, coords, "?overview=false")
  
  # Make the API call
  tryCatch({
    response <- GET(url, timeout(5))
    stop_for_status(response) # Check for HTTP errors (like 400)
    data <- content(response, "parsed")
    
    # Check for success and routes
    if (data$code == 'Ok' && length(data$routes) > 0) {
      route <- data$routes[[1]]
      distance_m <- route$distance
      duration_s <- route$duration
    } else {
      # Handle OSRM internal errors (like 'NoRoute')
      message(paste("  -> OSRM API returned:", data$code, "for coordinates:", coords))
    }
  }, error = function(e) {
    # Catch connection or status errors
    message(paste("  -> OSRM Request Error:", e))
  })
  
  # Return results as a list
  return(list(Distance_m = distance_m, Duration_s = duration_s))
}

# --- 4. Generic Main Function ---

calculate_routes_from_dataframe <- function(df, start_lon_col, start_lat_col, end_lon_col, end_lat_col) {
  #' Calculates OSRM routes between specified coordinate columns in a dataframe.
  #'
  #' @param df The input dataframe.
  #' @param start_lon_col Name of the start longitude column (string).
  #' @param start_lat_col Name of the start latitude column (string).
  #' @param end_lon_col Name of the end longitude column (string).
  #' @param end_lat_col Name of the end latitude column (string).
  #' @return The original dataframe with Distance_m and Duration_s columns appended.

  cat(paste0("\nStarting generic OSRM calculations for ", nrow(df), " entries...\n"))
  
  # Check if all columns exist
  required_cols <- c(start_lon_col, start_lat_col, end_lon_col, end_lat_col)
  if (!all(required_cols %in% names(df))) {
    stop(paste("Required column(s) not found:", paste(setdiff(required_cols, names(df)), collapse = ", ")))
  }
  
  # 1. Prepare data by selecting relevant columns, ensuring numeric, and removing NAs
  df_prepared <- df %>%
    # Select only the relevant coordinate columns for the OSRM processing
    select(all_of(required_cols)) %>%
    mutate(original_index = row_number()) %>%
    # Ensure all are numeric
    mutate(across(all_of(required_cols), as.numeric)) %>%
    # Filter only rows with valid coordinates
    filter(!if_any(all_of(required_cols), is.na))
    
  total_processed <- nrow(df_prepared)
  total_skipped <- nrow(df) - total_processed

  if (total_processed == 0) {
      cat("No valid coordinate pairs found. Returning original dataframe.\n")
      # Append NA columns to the original dataframe if no routes were processed
      return(df %>% mutate(Distance_m = NA_real_, Duration_s = NA_real_))
  }

  cat(paste0("-> Found ", total_skipped, " entries with missing coordinates that will be skipped.\n"))
  cat(paste0("-> Processing ", total_processed, " valid entries now.\n"))
  
  # 2. Iterate and Call OSRM API
  
  # Vectors to store results for the rows being processed
  distance_vec <- rep(NA_real_, total_processed)
  duration_vec <- rep(NA_real_, total_processed)
  
  for (i in 1:total_processed) {
    row <- df_prepared[i, ]
    
    if (i %% 100 == 0 || i == 1 || i == total_processed) {
      cat(paste0("Processed row ", i, "/", total_processed, "\n"))
    }
    
    results <- get_osrm_route(
      row[[start_lon_col]], row[[start_lat_col]], 
      row[[end_lon_col]], row[[end_lat_col]]
    )
    
    distance_vec[i] <- results$Distance_m
    duration_vec[i] <- results$Duration_s
    
    Sys.sleep(BATCH_DELAY)
  }
  
  # 3. Merge results back to the original full dataframe
  
  df_results <- df_prepared %>%
    select(original_index) %>%
    mutate(
      Distance_m = distance_vec,
      Duration_s = duration_vec
    )
  
  # Perform a left join on the original row index
  df_final <- df %>%
    mutate(original_index = row_number()) %>%
    left_join(df_results, by = "original_index") %>%
    select(-original_index)
    
  cat("\n========================================================================\n")
  cat("Generic OSRM calculation complete. Returning augmented dataframe.\n")
  cat("========================================================================\n")
  
  return(df_final)
}

# --- 5. Example Usage ---
#
# # 1. Create a dummy dataframe:
# # my_data <- tibble(
# #   site_id = 1:3,
# #   warehouse_lon = c(-157.8, -157.9, -158.0),
# #   warehouse_lat = c(21.3, 21.4, 21.5),
# #   customer_lon = c(-157.7, -157.8, NA), # NA added to test skipping
# #   customer_lat = c(21.4, 21.5, 21.6)
# # )
# #
# # 2. Call the function, specifying your column names:
# # result_df <- calculate_routes_from_dataframe(
# #   df = my_data,
# #   start_lon_col = "warehouse_lon",
# #   start_lat_col = "warehouse_lat",
# #   end_lon_col = "customer_lon",
# #   end_lat_col = "customer_lat"
# # )
#
# # The result will have the original columns plus Distance_m and Duration_s.
```

Thank for use the data has the travel distance already calculated for each zip code in the dataset.

8.  **Estimate the model** using the collected data

We are using a zonal method to calculate the trips. Zonal travel cost means that for each location ID you are measuring the distance to park. The location ID is a zip code case.

A Poisson model (specifically [Poisson regression]{.underline}) is a type of generalized linear model (GLM) designed to model count data. This is where the response variable represents counts of events (e.g., number of trips, visits, accidents).

This model accounts for the following:

1.  Count outcomes (non-negative integers)

    -   Models outcomes like 0, 1, 2, 3\...

    -   Cannot produce negative predictions.

2.  The fact that the variance equals the mean

    -   In a Poisson distribution, the mean (μ) is equal to the variance (σ²).

    -   This is a key assumption of the model:

        $$ Var(Y)=E(Y) $$

3.  Skewed distribution

    -   Count data are often right-skewed (many small values, few large ones), which the Poisson model handles better than linear regression.

4.  Log-linear relationship between predictors and expected counts

    -   The model assumes a logarithmic link between the mean of the outcome and the predictors:

        $$log⁡(μ)=β0+β1X1+β2X2+…e$$

5.  **Independent events**

    -   Each event (e.g., each person's number of trips) is assumed to be **independent** of others.

## Count Model

Simple cost and the decision of a trip

```{r}
model1=glm(trip ~ cost, 
            data = park, 
            family = poisson())
summary(model1)

```

9.  **Calculate the access value** by estimating **consumer surplus**

### Simple WTP

```{r}
1/model1$coefficients[2]
```

The interpretation suggest that on average the consumer surplus for each person who camps at this park is on average \$368.

## More Controls

Multiple variable regression: Controlling for more factors in the model

```{r}
model2=glm(trip ~ cost + income+factor(year)+temp_avg+mdays+factor(sitetype), 
            data = park, 
            family = poisson())

summary(model2)
```

### WTP

```{r}
1/model2$coefficients[2]
```

We can see with more controls our measurement reduces and is more conservative.

The consumer surplus is now around \$302 per person.

From here you could take the sum of the season or year and calculate the WTP for this specific site. For example, a policy in which the park will impacted the park, funding for park, etc. This is helpful in estimating the total use value (consumer surplus) of one site.

# Multi-Site

A single site has a lot of caveats, however. You canʻt say much about what the impacts are to other parks. People may just trade-off to go to a different park and look at sites as a package.

In that cause we use a **multi-site approach** by estimating demand systems, comparing multiple sites, and valuing changes in site characteristics or policy scenarios.

Use **multi-site models** when:

-   You want to evaluate relative site quality or rank sites.

-   You have data on multiple alternative sites and want to understand visitor choice behavior.

-   You're interested in estimating marginal values of site attributes (e.g., distance, facilities, congestion).

Common models used:

-   **Random Utility Models (RUMs)** or

-   **Nested Logit Models**

![](https://y.yarn.co/c3d73787-78a2-4e9f-b932-e5dfdb46bc3f_text.gif)

![](https://64.media.tumblr.com/a2d3daadbf45c4d2903f82c5c5383f6b/tumblr_myg0klS5sR1snil4go2_250.gif)