Using the Phy Algorithm

Introduction

In this article, I introduce how to use the optimize_phy() function. I’ll work with two examples, one trivial, one not so much. Consider also seeing the vignette("theoretical_framework", package = "phyopt") article for a theoretical background of the algorithm.

Example

The problem of interest is:

\[ \begin{array}{lr} \min_{x, \tilde{x}} x^2 + \tilde{x}^2 ~~s.t.~~ \tilde{x} - 2 \geq 0,\\ X = \tilde{X} = [-10, 10] \end{array} \]

First, let’s load the package:

#> Loading required package: purrr
#> Loading required package: rlang
#>
#> Attaching package: 'rlang'
#> The following objects are masked from 'package:purrr':
#>
#>     %@%, flatten, flatten_chr, flatten_dbl, flatten_int, flatten_lgl,
#>     flatten_raw, invoke, splice

I’ll also set the objects below, but will explain them later.

stopper <- flow_stopper(list(max_iter = t ~ .m >= 20))
logger <- flow_logger()

Solving The Example

First, let’s define the mathematical objects of the problem:

f <- \(x, xtil) x^2 + xtil^2
g <- \(x, xtil) 0
gtil <- \(xtil) xtil - 2

x_dom <- xtil_dom <- list(c(-10, 10))

Why this format for x_dom? It could be different. In the end, it is whichever format will be useful for the way you’ll define your operators.

Let’s get to it then; for this problem, I’ll choose a random (uniform) sampler for initialization, a golden search for optimization, and a genetic algorithm’s approach of crossover+mutation for the updater. This could’ve been a different choice, and the user can even mix and match choices.

Initializer

The number of samples \(N\) will be fixed at \(10\), so I’ll set it at the start. Then, consider this example of random sampler, that resamples draw if it fails the constraint:

n_samples <- 10

initializer <- function(xtil_dom, gtil) {
  xtil <- rep(NA, n_samples)

  for (s in seq_len(n_samples)) {
    draw <- NA
    while (is.na(draw)) {
      draw <- runif(1, xtil_dom[[1]][1], xtil_dom[[1]][2])
      draw <- ifelse(gtil(draw) >= 0, yes = draw, no = NA)
    }
    xtil[s] <- draw
  }

  matrix(xtil, n_samples, 1)
}

Note that the arguments must be exactly those (with same names). The result is required to be a \(N \times \tilde{m}\) matrix:

initializer(xtil_dom, gtil)
#>           [,1]
#>  [1,] 4.305030
#>  [2,] 8.700652
#>  [3,] 6.810187
#>  [4,] 6.839513
#>  [5,] 7.679364
#>  [6,] 9.896784
#>  [7,] 4.310066
#>  [8,] 5.705723
#>  [9,] 8.288507
#> [10,] 5.874657

Optimizer

Now, let’s do the golden search optimizer. See the Wikipedia page to learn more about it.

optimizer <- function(f_s, g_s, x_dom, t, xtil_s) {
  phi <- 2 / (sqrt(5) + 1)

  a <- x_dom[[1]][1];  b <- x_dom[[1]][2]
  x1 <- b - phi * (b - a);  x2 <- a + phi * (b - a)
  fx1 <- f_s(x1);  fx2 <- f_s(x2)

  iter <- 0

  while (abs(b - a) > 1e-5 && iter < 1000) {
    if (fx1 < fx2) {
      b <- x2
      x2 <- x1; fx2 <- fx1
      x1 <- b - phi * (b - a)
      fx1 <- f_s(x1)
    } else {
      a <- x1
      x1 <- x2; fx1 <- fx2
      x2 <- a + phi * (b - a)
      fx2 <- f_s(x2)
    }
    iter <- iter + 1
  }

  x <- (a + b) / 2
  data.frame(x = I(matrix(x)), xtil = I(matrix(xtil_s)), f = f_s(x), i = NA)
}

Again, the argument names are required. They denote that the optimizer will run once for every sample \(xtil_s\) in the current population \(S_t\). The functions f_s and g_s will receive values as below:

xtil_s <- 5
f_s <- \(x) f(x, xtil_s)
g_s <- \(x) g(x, xtil_s)

optimizer(f_s = f_s, g_s = g_s, x_dom = x_dom, t = 5, xtil_s = xtil_s)
#>              x xtil  f  i
#> 1 2.053031e-06    5 25 NA

Note the required data frame format. Also note how I(.) is used to create a matrix column. Did you know that data frames can have matrices for columns? See this section of [Advanced R (2e)](https://adv-r.hadley.nz/vectors-chap.html?q=I(#matrix-and-data-frame-columns) for more information. This is a very neat trick that enables the very elegant use of the flow_* formulas evaluation.

Also note that one could create an optimizer that randomly chooses from a list of optimizers, use different ones at the same time, for sensitivity reasons.

Joining The Pieces

Let’s join what we have. In simple terms, the algorithm initializes as below:

results <- vector("list", length(stopper$iter_upper))

results[[1]]$xtil <- initializer(xtil_dom, gtil)

Then, it enters a loop (emulated by t <- 1 for its first iteration), and creates the xtil_t (\(S_t\)) variable. Then, for each sample \(s\) we do a loop via map (again, see Advanced R), and save the 1-row data frames in a list. This list then is ‘rbinded’ via list_rbind.

t <- 1
xtil_t <- results[[t]]$xtil

results[[t]] <- list_rbind(map(seq_len(nrow(xtil_t)), function(s) {
  xtil_s <- xtil_t[s, ]
  f_s <- \(x) f(x, xtil_s)
  g_s <- \(x) g(x, xtil_s)
  optimizer(f_s = f_s, g_s = g_s, x_dom = x_dom, t = t, xtil_s = xtil_s)
}))

See how it turns out below:

str(results[[t]])
#> 'data.frame':    10 obs. of  4 variables:
#>  $ x   : 'AsIs' num [1:10, 1] 2.05e-06 2.05e-06 2.05e-06 2.05e-06 2.05e-06 ...
#>  $ xtil: 'AsIs' num [1:10, 1] 2.28 9.74 3.68 3.84 4.51 ...
#>  $ f   : num  5.19 94.85 13.56 14.72 20.3 ...
#>  $ i   : logi  NA NA NA NA NA NA ...

Note the matrices columns. In this case is hard to differentiate because they are 1-column matrices, but this will accommodate any dimension \(m\) and \(\tilde{m}\).

See the full results below:

results[[t]]
#>               x     xtil         f  i
#> 1  2.053031e-06 2.277651  5.187693 NA
#> 2  2.053031e-06 9.738999 94.848101 NA
#> 3  2.053031e-06 3.682554 13.561204 NA
#> 4  2.053031e-06 3.836336 14.717475 NA
#> 5  2.053031e-06 4.505252 20.297293 NA
#> 6  2.053031e-06 5.338789 28.502670 NA
#> 7  2.053031e-06 4.455299 19.849685 NA
#> 8  2.053031e-06 4.395750 19.322614 NA
#> 9  2.053031e-06 4.749405 22.556850 NA
#> 10 2.053031e-06 6.331674 40.090094 NA

Updater

Now let’s move forward and define the genetic algorithm approach to update the sample. We sort \(S_t\) based on the lowest values of \(f\) decreasing = FALSE, as we are looking for a minimum.

Then, we select pairs of values, with greater probability if they have lower values. They are used to create a “child”, which is “mutated” with random multiplicative noise. Finally, the constraint is checked and triggers a resampling if needed.

updater <- function(xtil_dom, gtil, r_t, t) {
  xtil_ordered <- r_t$xtil[order(r_t$f, decreasing = FALSE), 1]
  xtil_new <- matrix(NA, nrow = n_samples, ncol = 1)

  probs <- 1 / (1:n_samples)
  probs <- probs / sum(probs)

  for (s in seq_len(n_samples)) {
    draw <- NA
    while (is.na(draw)) {
      parents <- sample(n_samples, 2, prob = probs)
      draw <- (xtil_ordered[parents[1]] + xtil_ordered[parents[2]]) / 2
      draw <- draw * (1 + runif(1, -0.1, 0.1))
      draw <- ifelse(gtil(draw) >= 0, draw, NA)
    }
    xtil_new[s, 1] <- draw
  }

  xtil_new
}

Again, fixed argument names and a matrix result. See the output below:

updater(xtil_dom = xtil_dom, gtil = gtil, r_t = results[[t]], t = t)
#>           [,1]
#>  [1,] 3.540543
#>  [2,] 4.391573
#>  [3,] 3.247544
#>  [4,] 4.376917
#>  [5,] 3.020130
#>  [6,] 3.885588
#>  [7,] 5.331307
#>  [8,] 3.441450
#>  [9,] 3.019013
#> [10,] 3.568209

The `optimize_phy()` Aggregation

All of these ingredients are combined in the optimize_phy() function, which also tests if:

The functions have the correct arguments.
The domains have the correct length.
The operators have the correct arguments and generate results in the correct format.
Catches errors in the operators calls to help debug your code.
Calculates metrics to check performance, log them in the console, and stop the algorithm if any reaches some desired value.

Metrics

Let’s talk about flow_stopper and flow_logger. See the full help page for details, but flow_stopper receives a list such as:

list(
  max_iter = t ~ .m >= 20,
  max_time = time ~ .m >= 600,
  f_prop = (max(f[[t]]) - max(f[[t - l]])) / abs(max(f[[t - 1]])) ~ .m < 0.01
)

Each element is a formula, with an expression on the LHS that can access the results up until that point, i.e.:

The current iteration t or time.
The sequences of f (a \(t \times 1\) vector) for all iterations.
The sequence of x (a \(t \times m\) matrix) or xtil (a \(t \times \tilde{m}\) matrix).
The sequence of meta-information i, whichever format you made it to be.

That is, the user can define any metric they want. This expression will be evaluated in the current context of the algorithm, via non-standard evaluation (again, Advanced R, yes, that is my favorite R book, how did you know?), using the columns of \(results_t\) as variables (hence the beauty of the matrix-columns).

Then, the RHS is an expression that takes the value of the metric in .m and returns a single boolean.

Finally, the check_expr is an expression that gets the vector of all the booleans (.ms) and combines them with some logical expression into a single boolean, TRUE if the algorithm should be stopped. This allows for stopping the algorithm only if a group of metrics is true.

flow_logger is very similar, but can also access the metrics calculated as variables (hence the names on the list above), and the RHS is a function that formats .m into a single string.

Let’s set them as below. The only stopping criterion is reaching 20 iterations.

stopper <- flow_stopper(list(max_iter = t ~ .m >= 20))
logger <- flow_logger()

These objects are comprised of elements that will be used by the algorithm:

str(stopper)
#> List of 3
#>  $ iter_upper   : num 10000
#>  $ get_metrics  :function (results, t, time)  
#>  $ check_metrics:function (metrics_t, which = FALSE)  
#>  - attr(*, "class")= chr "flow_stopper"

str(logger)
#> List of 1
#>  $ log:function (results_t, t, time, metrics_t)  
#>  - attr(*, "class")= chr "flow_logger"

The Full Algorithm

Let’s run everything!

optima <- optimize_phy(
  f, g, gtil,
  x_dom, xtil_dom,
  initializer, optimizer, updater,
  stopper, logger,
  check_op = 0
)
#> → => Iter: 1 -- mean(f): 32.1783
#> → => Iter: 2 -- mean(f): 18.294
#> → => Iter: 3 -- mean(f): 10.182
#> → => Iter: 4 -- mean(f): 5.9014
#> → => Iter: 5 -- mean(f): 5.8962
#> → => Iter: 6 -- mean(f): 5.155
#> → => Iter: 7 -- mean(f): 4.9935
#> → => Iter: 8 -- mean(f): 4.927
#> → => Iter: 9 -- mean(f): 4.8196
#> → => Iter: 10 -- mean(f): 4.9571
#> → => Iter: 11 -- mean(f): 5.0388
#> → => Iter: 12 -- mean(f): 4.9031
#> → => Iter: 13 -- mean(f): 4.7685
#> → => Iter: 14 -- mean(f): 4.9591
#> → => Iter: 15 -- mean(f): 4.6509
#> → => Iter: 16 -- mean(f): 4.8104
#> → => Iter: 17 -- mean(f): 4.8561
#> → => Iter: 18 -- mean(f): 4.7375
#> → => Iter: 19 -- mean(f): 4.8198
#> → => Iter: 20 -- mean(f): 4.5971
#> 
#> ✔ Finished after 20 iterations. Stopping criteria(s) were:
#> • max_iter: t ~ .m >= 20

str(optima, 1)
#> List of 4
#>  $ results     :List of 20
#>  $ metrics     :List of 20
#>  $ metrics_stop: Named chr "max_iter: t ~ .m >= 20"
#>   ..- attr(*, "names")= chr "max_iter"
#>  $ duration    :List of 4

We can get the duration, which stopping criterias were met, and all the \(R_t\) and \(M_t\), including the best (last):

optima$duration
#> $iters
#> [1] 20
#> 
#> $time_init
#> Time difference of 0.04998708 secs
#> 
#> $time_loop
#> Time difference of 2.879376 secs
#> 
#> $time_total
#> Time difference of 2.929367 secs

optima$metrics_stop
#>                 max_iter 
#> "max_iter: t ~ .m >= 20"

r_t <- optima$results[[20]]
r_t[which.max(r_t$f), ]
#>              x     xtil        f  i
#> 7 2.053031e-06 2.254657 5.083478 NA

We can see that we got very close to the global minimum \((0, 2)\). The efficacy is only due to the number of iters and the user code.