Discrete Choice Experiment

Stated Preference : DCE

Discrete Choice Experiments (DCEs) present respondents with several choice scenarios, each containing multiple alternatives described by various attributes and their levels. Respondents choose their preferred alternative in each scenario. R packages facilitate the design of DCEs (e.g., using orthogonal main-effect designs) and the analysis of choice data using models like conditional and binary logit.

For our example we will be using this book

Environmental Valuation with Discrete Choice Experiments in R (Mariel et al. 2025)

These are the libraries you need to run the code below:

Show code

# Note if you don't have packages install.packages("put library name in here")

#install.packages("Rfast")
#install.packages("spdesign")
#install.packages("tidyr")
#install.packages("tibble")


library(Rfast)
library(spdesign)
library(ggplot2)
library(tidyr)
library(tibble)

We will use the example the above book uses through out the chapters.

Attribute & Levels

Attributes	Labels	Levels
Size of Wind Farm (discrete)	Small Farms	0
*Note reference is LargeFarm*		1
	MediumFarms	0
		1
Max. Height Turbine (discrete)	Low Height	0
*Note reference is HighHeight*		1
	Medium Height	0
		1
Reduction in Red Kite (continous)	Red Kite	5
		7.5
		10
		12.5
		15
Distance to residents (continous)	MinDistance	750
		1000
		1250
		1500
		1750
MonthlyCost (Continous)	Cost	0
		1
		2
		3
		4
		…..
		….
		10

Choice Set

Lets first consider this example

This choice card has 3 alternatives and thus 3 different utility functions you are estimating:

The status quo or the opting out and keeping things the way they are. The utility function would look something like this:

\[ \begin{aligned} U_{n1t} =\; & \beta_{mf} \, Med.Farms_{n1t} + \beta_{sf} \, SmallFarms_{n1t} \\& + \beta_{mh} \, Med.Height_{n1t} + \beta_{lh} \, LowHeight_{n1t} \\& + \beta_{rk} \, redKite_{n|t} + \beta_{md} \, MinDistance_{n1t} \\& + \beta_{cost} \, Cost_{n1t} + \epsilon_{n1t} \end{aligned} \]

Program B will be alternative 2 and thus is indexed by 2

\[ \begin{aligned} U_{n2t}= \beta_{mf}Med.Farms_{n2t}+\beta_{sf}SmallFarms_{n2t} \\ +\beta_{mh}Med.Height_{n2t}+\beta_{lh}LowHeight_{n2t} \\ +\beta_{rk}redKite_{n2t}+\beta_{md}MinDistance_{n2t} \\ \beta_{cost}Cost_{n2t}+ \epsilon_{n2t} \\ \end{aligned} \]

Program C will be alternative 3 and thus is indexed by 3

\[ \begin{aligned} U_{n3t}= \beta_{mf}Med.Farms_{n3t}+\beta_{sf}SmallFarms_{n3t} \\ +\beta_{mh}Med.Height_{n3t}+\beta_{lh}LowHeight_{n3t} \\ +\beta_{rk}redKite_{n3t}+\beta_{md}MinDistance_{n3t} \\ \beta_{cost}Cost_{n3t}+ \epsilon_{n3t} \\ \end{aligned} \]

Experiential Design

We will now look at a full factorial design for the entire choice set and all the levels. Given the choices above the amount of possible combinations balloons to 5mil+ observations!

Show code

# Create the full factorial using a named list of attributes and levels in the wide format
full_fact <- full_factorial( list( alt1_sq = 1,
alt1_farm = 0,
alt1_height = 0,
alt1_redkite = 0,
alt1_distance = 0,
alt1_cost = 0,
alt2_sq = 0,
alt2_farm = c(1, 2, 3),
alt2_height = c(1, 2, 3),
alt2_redkite = c(-5, -2.5, 0, 2.5, 5), alt2_distance = c(0, 0.25, 0.5, 0.75, 1), alt2_cost = 1:10,
alt3_sq = 0,
alt3_farm = c(1, 2, 3),
alt3_height = c(1, 2, 3),
alt3_redkite = c(-5, -2.5, 0, 2.5, 5), alt3_distance = c(0, 0.25, 0.5, 0.75, 1), alt3_cost = 1:10
) )

# Show the first six rows and 8th to 12th columns of the design matrix
full_fact[1:6, c(1, 8:12)]

  alt1_sq alt2_farm alt2_height alt2_redkite alt2_distance alt2_cost
1       1         1           1           -5             0         1
2       1         2           1           -5             0         1
3       1         3           1           -5             0         1
4       1         1           2           -5             0         1
5       1         2           2           -5             0         1
6       1         3           2           -5             0         1

As the number of attributes, levels, and alternatives increases, full factorial designs become less practical for several reasons:

Duplicate alternatives: Some choice tasks may repeat the same alternative, which doesn’t help us learn anything new about preferences.
Dominated alternatives: Some options in a choice task might be clearly worse (or better) than others in every way. These don’t help reveal trade-offs because people will always pick the best one, making the data less useful.
Lack of control: The full factorial includes all possible combinations, even unrealistic ones. For example, we might want to prevent small wind farms from showing up with the highest red kite impact.

Logical Operators

Lets say we want to put restriction by putting logical restrictions. For example, the tall windmills cannot be placed too close to residential areas (this could already be a law and thus is a more accurate reflection of reality).

Show code

candidate_set <- full_fact[!((full_fact$alt2_height == 1 & full_fact$alt2_distance < 0.75) | (full_fact$alt3_height == 1 & full_fact$alt3_distance < 0.75)), ]

candidate_set[1:6, c(1, 8:12)]

     alt1_sq alt2_farm alt2_height alt2_redkite alt2_distance alt2_cost
6754       1         1           2           -5             0         1
6755       1         2           2           -5             0         1
6756       1         3           2           -5             0         1
6757       1         1           3           -5             0         1
6758       1         2           3           -5             0         1
6759       1         3           3           -5             0         1

This reduces the number of observations to 3.2million+ but does not make our choice set reasonable for population sampling. There for we move onto the next approach D-efficient

D-efficient Design

In statistics, we often try to reduce standard errors to improve the precision of our estimates. The same idea applies in Discrete Choice Experiments (DCEs). We want to design choice tasks that give us the most precise information.

Think of it this way:

When fitting a model, we already have the data and estimate the parameters that best explain it.
When designing a DCE, we do the reverse: we assume values for the parameters (called priors) and then search for the combination of attributes and levels that will give us the most information — that is, the lowest standard errors or lowest D-error.

Utility Function

For our example we need to design a utility function to estimate the best set of potential choice cards. The utility function was written out within choice set section. So here we are going to use the library spdesign to write out each alternative.

Show code

utility <- list(
alt1 = "b_sq[0] * sq[1]",
alt2 = "b_farm_dummy[c(0.25, 0.5)] * farm[c(1, 2, 3)] +
b_height_dummy[c(0.25, 0.5)] * height[c(1, 2, 3)] + b_redkite[-0.05] * redkite[c(-5, -2.5, 0, 2.5, 5)] + b_distance[0.5] * distance[c(0, 0.25, 0.5, 0.75, 1)] + b_cost[-0.05] * cost[seq(1, 10)]",
alt3 = "b_farm_dummy * farm + b_height_dummy * height +
b_redkite * redkite + b_distance * distance + b_cost * cost"
)

Generating Design

In library spdesign generate_designis a function that generates efficient experimental designs. The function takes a set of indirect utility functions and generates efficient experimental designs assuming that people are maximizing utility.

Here are the arguments needed for our example:

`utility`	A named list of utility functions. See the examples and the vignette for examples of how to define these correctly for different types of experimental designs.
`rows`	An integer giving the number of rows in the final design
`model`	A character string indicating the model to optimize the design for. Currently the only model programmed is the ‘mnl’ model and this is also set as the default.
`efficiency_criteria`	A character string giving the efficiency criteria to optimize for. One of ‘a-error’, ‘c-error’, ‘d-error’ or ‘s-error’. No default is set and argument must be specified. Optimizing for multiple criteria is not yet implemented and will result in an error.
`algorithm`	A character string giving the optimization algorithm to use. No default is set and the argument must be specified to be one of ‘rsc’, ‘federov’ or ‘random’.

Show code

# Generate design ----
design <- generate_design(utility, rows = 100,
model = "mnl", efficiency_criteria = "d-error", algorithm = "rsc")

── Checking function arguments ──

ℹ The cycling part of the algorithm is not used. It only applies to a
small subset of designs. The algorithm swithes between relabeling of
attribute levels and swapping of attributes.

── Preparing the list of priors ──

✔ Priors prepared successfully

── Evaluating designs ──────────────────────────────────────────────────────────


──────────────────────────────────────────────────────────────────────────────── 
 Iteration   A-error   C-error   D-error   S-error               Time stamp
──────────────────────────────────────────────────────────────────────────────── 
         1    0.1267       N/A    0.0422       Inf2025-11-14 17:45:13.699565
────────────────────────────────────────────────────────────────────────────

ℹ Efficiency criteria is less than threshhold.

── Cleaning up design environment ──────────────────────────────────────────────

Time spent searching for designs:  0.0545609

Show code

summary(design)

---------------------------------------------------------------------
An 'spdesign' object

Utility functions:
alt1 : b_sq * alt1_sq 
alt2 : b_farm_dummy * alt2_farm + b_height_dummy * alt2_height + b_redkite * alt2_redkite + b_distance * alt2_distance + b_cost * alt2_cost 
alt3 : b_farm_dummy * alt3_farm + b_height_dummy * alt3_height + b_redkite * alt3_redkite + b_distance * alt3_distance + b_cost * alt3_cost 


  a-error   c-error   d-error   s-error 
0.1266935       NaN 0.0421518       Inf 

---------------------------------------------------------------------

Printing the first few rows of the design 
# A tibble: 6 × 15
  alt1_sq alt2_farm2 alt2_farm3 alt2_height2 alt2_height3 alt2_redkite
    <dbl>      <dbl>      <dbl>        <dbl>        <dbl>        <dbl>
1       1          1          0            0            0          0  
2       1          0          1            1            0         -2.5
3       1          1          0            1            0          5  
4       1          0          1            0            1          2.5
5       1          1          0            0            0          2.5
6       1          1          0            0            0         -2.5
# ℹ 9 more variables: alt2_distance <dbl>, alt2_cost <dbl>, alt3_farm2 <dbl>,
#   alt3_farm3 <dbl>, alt3_height2 <dbl>, alt3_height3 <dbl>,
#   alt3_redkite <dbl>, alt3_distance <dbl>, alt3_cost <dbl>

---------------------------------------------------------------------

Correlation

Next step check correlation

Show code

# Correlation matrix
cor(design)

Warning in stats::cor(x[["design"]], y = NULL, use = "everything", method =
c("pearson", : the standard deviation is zero

              alt1_sq   alt2_farm2    alt2_farm3 alt2_height2 alt2_height3
alt1_sq             1           NA            NA           NA           NA
alt2_farm2         NA  1.000000000 -5.037175e-01 -0.009876814  0.019607843
alt2_farm3         NA -0.503717523  1.000000e+00 -0.040253279 -0.009876814
alt2_height2       NA -0.009876814 -4.025328e-02  1.000000000 -0.503717523
alt2_height3       NA  0.019607843 -9.876814e-03 -0.503717523  1.000000000
alt2_redkite       NA  0.119416287 -1.052661e-01  0.045114057  0.074635179
alt2_distance      NA  0.029854072 -6.015208e-02 -0.150380191  0.164197394
alt2_cost          NA -0.014699129 -9.995682e-02  0.070339983 -0.044097386
alt3_farm2         NA  0.079912406 -4.025328e-02  0.095431931 -0.054771424
alt3_farm3         NA -0.114081996  7.991241e-02 -0.009876814  0.064171123
alt3_height2       NA -0.009876814  9.543193e-02  0.004975124  0.079912406
alt3_height3       NA  0.169701625 -1.759385e-01 -0.085481682 -0.099666034
alt3_redkite       NA -0.074635179 -1.202080e-17  0.090228114  0.044781108
alt3_distance      NA -0.059708143 -7.519010e-02 -0.015038019 -0.164197394
alt3_cost          NA  0.007349564 -7.774419e-02 -0.040723148 -0.044097386
              alt2_redkite alt2_distance   alt2_cost   alt3_farm2   alt3_farm3
alt1_sq                 NA            NA          NA           NA           NA
alt2_farm2      0.11941629    0.02985407 -0.01469913  0.079912406 -0.114081996
alt2_farm3     -0.10526613   -0.06015208 -0.09995682 -0.040253279  0.079912406
alt2_height2    0.04511406   -0.15038019  0.07033998  0.095431931 -0.009876814
alt2_height3    0.07463518    0.16419739 -0.04409739 -0.054771424  0.064171123
alt2_redkite    1.00000000    0.04500000 -0.03200379  0.270684343 -0.149270359
alt2_distance   0.04500000    1.00000000  0.04431294  0.075190095 -0.014927036
alt2_cost      -0.03200379    0.04431294  1.00000000 -0.107361027 -0.058796515
alt3_farm2      0.27068434    0.07519010 -0.10736103  1.000000000 -0.503717523
alt3_farm3     -0.14927036   -0.01492704 -0.05879652 -0.503717523  1.000000000
alt3_height2    0.04511406   -0.10526613  0.06293577  0.004975124 -0.009876814
alt3_height3   -0.01503802    0.15038019 -0.01851052  0.095431931 -0.054771424
alt3_redkite    0.03500000   -0.07500000 -0.08616404 -0.030076038 -0.104489251
alt3_distance   0.12500000    0.08500000 -0.07877855 -0.015038019  0.044781108
alt3_cost       0.02215647    0.09847319 -0.09696970  0.070339983  0.007349564
               alt3_height2 alt3_height3  alt3_redkite alt3_distance
alt1_sq                  NA           NA            NA            NA
alt2_farm2    -9.876814e-03   0.16970163 -7.463518e-02 -5.970814e-02
alt2_farm3     9.543193e-02  -0.17593849 -1.202080e-17 -7.519010e-02
alt2_height2   4.975124e-03  -0.08548168  9.022811e-02 -1.503802e-02
alt2_height3   7.991241e-02  -0.09966603  4.478111e-02 -1.641974e-01
alt2_redkite   4.511406e-02  -0.01503802  3.500000e-02  1.250000e-01
alt2_distance -1.052661e-01   0.15038019 -7.500000e-02  8.500000e-02
alt2_cost      6.293577e-02  -0.01851052 -8.616404e-02 -7.877855e-02
alt3_farm2     4.975124e-03   0.09543193 -3.007604e-02 -1.503802e-02
alt3_farm3    -9.876814e-03  -0.05477142 -1.044893e-01  4.478111e-02
alt3_height2   1.000000e+00  -0.49253731 -1.503802e-02  2.838244e-17
alt3_height3  -4.925373e-01   1.00000000 -1.503802e-02  1.203042e-01
alt3_redkite  -1.503802e-02  -0.01503802  1.000000e+00 -1.150000e-01
alt3_distance  2.838244e-17   0.12030415 -1.150000e-01  1.000000e+00
alt3_cost      5.553157e-02  -0.01110631 -2.461830e-02 -6.646941e-02
                 alt3_cost
alt1_sq                 NA
alt2_farm2     0.007349564
alt2_farm3    -0.077744192
alt2_height2  -0.040723148
alt2_height3  -0.044097386
alt2_redkite   0.022156468
alt2_distance  0.098473193
alt2_cost     -0.096969697
alt3_farm2     0.070339983
alt3_farm3     0.007349564
alt3_height2   0.055531566
alt3_height3  -0.011106313
alt3_redkite  -0.024618298
alt3_distance -0.066469405
alt3_cost      1.000000000

Attribute Balance

Show code

# Print only the first three list elements
level_balance(design)[1:3]

$alt1_sq

  1 
100 

$alt2_farm2

 0  1 
66 34 

$alt2_farm3

 0  1 
67 33

First, we can see that the constant for the status quo alternative appears in all 100 rows of the design. Next, the medium and small wind farm sizes each occur 33 times, meaning the large size appears 34 times. This suggests the design is nearly balanced across attribute levels.

Dominated Strategy Check

Dominant or dominated alternatives should be avoided because they don’t provide useful information about trade-offs and can bias your results.

To check for this, we can look at the choice probabilities for each alternative. If one option has a probability close to 1, it likely dominates the others. If it’s close to 0, it’s probably dominated.

The spdesign package includes a probabilities() function that calculates these values based on your design and priors. It shows the probability of choosing each alternative in every choice task. Each row of the output adds up to 1.

Show code

# Check the utility balance by inspecting the probabilities. We use head() to avoid printing all 100 rows in the book.
probabilities(design) |>
head()

          alt1      alt2      alt3
[1,] 0.1804924 0.3288785 0.4906291
[2,] 0.1603367 0.4250793 0.4145840
[3,] 0.1615940 0.1831100 0.6552960
[4,] 0.1937322 0.4532648 0.3530030
[5,] 0.2998234 0.3151957 0.3849809
[6,] 0.3114263 0.3356815 0.3528922

To help spot any problematic choice tasks, we can create a simple plot. In this case, the plot shows no signs of dominating or dominated alternatives which would appear as very large or very small segments of a single color.

The status quo option (shown in red) has a low probability of being chosen, but it’s not too low to be a problem. What’s considered “too low” depends on the context. For example, in labelled experiments, some options are naturally chosen less often, especially if they represent less common situations.

If the status quo is chosen too often or too rarely compared to your expectations, you should adjust its prior value:

Increase the prior if you expect more people to choose the status quo.
Decrease it if you expect fewer to choose it.

This step highlights why it’s important to check your design and make sure the priors match what you expect from real-world behavior.

Show code

# Create a plot to show the choice shares across the design
probabilities(design) |>
as_tibble() |>
rowid_to_column() |>
pivot_longer(-rowid, names_to = "alt", values_to = "prob") |> ggplot(aes(x = rowid, y = prob, fill = alt)) + geom_bar(position = "fill", stat = "identity") +
labs(x = "Choice task", y = "Choice probability", fill = "Alternative") + scale_x_continuous(breaks = seq(1, 100, by = 2)) + scale_fill_discrete(label = c("SQ", "Alt 1", "Alt 2")) +
theme_bw() +
theme(
legend.position = "bottom", axis.text.x = element_text(angle = 315)
)

Utility Balance

So next on the list would be check utility balance of each choice task in our design.

Show code

utility_balance <- function(x) {
#Ensure that it is a matrix (and not a data.frame()/tibble()) 
x <- as.matrix(x)

# Find number of non-zero alternatives where 0 or NA can be non-available
n_alts <- apply(x, 1, function(y) sum(y > 0, na.rm = TRUE)) 

# Calculate for each alternative
x <- x / (1 / n_alts)

#Replace all zeroes with 1 to enable taking the product
index_zero <- x == 0 
x[index_zero] = 1

# Take the product. This line requires the Rfast package.
x <- Rfast::rowprods(x) 
return(x)
}

# Use the function for utility balance on the choice probabilities
utility_balance(probabilities(design)) |> 
head()

[1] 0.7863419 0.7629202 0.5235264 0.8369431 0.9823104 0.9960669

The function returns the utility balance for each choice task. The average utility balance across the design is 0.8478. For efficient designs, values typically fall between 70% and 90% indicates a good balance, not too equal (which gives little information) and not too skewed to have dominant alternatives.

Block Design

The design created in this includes 100 choice tasks which is far too many for a single respondent to handle effectively. To address this, we present two common solutions, starting with the most widely used: blocking.

Blocking involves dividing the full design into smaller subsets, or blocks, so that each respondent is only shown the tasks from one block. For example, if pre-testing shows that respondents can comfortably complete 10 tasks, then a 100-task design would be split into 10 blocks of 10 tasks each.

Each choice task still needs to be answered by at least one respondent, so blocking increases the number of participants required. In this case, you’d need at least 10 respondents (one per block), instead of just one respondent completing all 100 tasks. In general, larger designs with blocking demand more respondents to ensure all tasks are adequately covered.

When using a blocked design, each respondent is randomly assigned to a block, and the order of choice tasks within that block is also randomized. Be sure to record the specific choice tasks shown to each respondent so you can accurately reconstruct their responses later.

The blocking column in your design must be orthogonal, meaning it should not be correlated with the other attributes. The block() function from the spdesign package creates a blocking column that minimizes mean squared correlation. However, it does not preserve attribute level balance within each block.

If your overall design is balanced, blocking won’t change that. But keep in mind that in a blocked design, some respondents may never see certain attribute levels, which could affect how realistic the choice tasks feel. Also, depending on the complexity of your design, generating the blocking column may take some time.

Show code

# Add a blocking variable to the design with 10 blocks.
design <- block(design, 10)

Warning in stats::cor(design, block): the standard deviation is zero

Show code

design$blocks_correlation

# A tibble: 1 × 15
  alt1_sq alt2_farm2 alt2_farm3 alt2_height2 alt2_height3 alt2_redkite
    <dbl>      <dbl>      <dbl>        <dbl>        <dbl>        <dbl>
1      NA   -0.00735    -0.0481     -0.00370      0.00735      -0.0419
# ℹ 9 more variables: alt2_distance <dbl>, alt2_cost <dbl>, alt3_farm2 <dbl>,
#   alt3_farm3 <dbl>, alt3_height2 <dbl>, alt3_height3 <dbl>,
#   alt3_redkite <dbl>, alt3_distance <dbl>, alt3_cost <dbl>

Here, we see that the blocking column is practically uncorrelated with the rest of the design.

References

Mariel, Petr, Danny Campbell, Erlend Dancke Sandorf, Jürgen Meyerhoff, Ainhoa Vega-Bayo, and Rebecca Blevins. 2025. Environmental Valuation with Discrete Choice Experiments in R: A Guide on Design, Implementation, and Data Analysis. Vol. 17. The Economics of Non-Market Goods and Resources. Cham: Springer Nature Switzerland. https://doi.org/10.1007/978-3-031-89338-4.