Using the Phy Algorithm

Introduction

In this article, I introduce how to use the optimize_phy() function. I’ll work with two examples, one trivial, one not so much. Consider also seeing the vignette("theoretical_framework", package = "phyopt") article for a theoretical background of the algorithm.

Example

The problem of interest is:

\[ \begin{array}{lr} \min_{x, \tilde{x}} x^2 + \tilde{x}^2 ~~s.t.~~ \tilde{x} - 2 \geq 0,\\ X = \tilde{X} = [-10, 10] \end{array} \]

First, lets load the package:

#> Loading required package: purrr
#> Loading required package: rlang
#>
#> Attaching package: 'rlang'
#> The following objects are masked from 'package:purrr':
#>
#>     %@%, flatten, flatten_chr, flatten_dbl, flatten_int, flatten_lgl,
#>     flatten_raw, invoke, splice

I’ll also set the objects below, but will explain them later.

stopper <- flow_stopper(list(max_iter = t ~ .m >= 20))
logger <- flow_logger()

Solving The Example

First, lets define the mathematical objects of the problem:

f <- \(x, xtil) x^2 + xtil^2
g <- \(x, xtil) 0
gtil <- \(xtil) xtil - 2

x_dom <- xtil_dom <- list(c(-10, 10))

Why this format for x_dom? It could be different. In the end, it is whichever format will be useful for the way you’ll define your operators.

Lets get to it then, for this problem, I’ll choose a random (uniform) sampler for initialization, a golden search for optimization, and a genetic algorithm’s approach of crossover+mutation for the updater. This could’ve been a different choice, and the user can even mix and match choices.

Initializer

The numbers of samples \(N\) will be fixed as \(10\), so I’ll set it in the beginning. Then, consider this example of random sampler, that resamples draw if it fails the constraint:

n_samples <- 10

initializer <- function(xtil_dom, gtil) {
  xtil <- rep(NA, n_samples)
  
  for (s in seq_len(n_samples)) {
    draw <- NA
    while (is.na(draw)) {
      draw <- runif(1, xtil_dom[[1]][1], xtil_dom[[1]][2])
      draw <- ifelse(gtil(draw) >= 0, yes = draw, no = NA)
    }
    xtil[s] <- draw
  }

  matrix(xtil, n_samples, 1)
}

Note that the arguments must be exactly those (with same names). The result is required to be a \(N \times \tilde{m}\) matrix:

initializer(xtil_dom, gtil)
#>           [,1]
#>  [1,] 3.807865
#>  [2,] 5.848747
#>  [3,] 8.246152
#>  [4,] 8.402323
#>  [5,] 9.525685
#>  [6,] 7.282250
#>  [7,] 5.207031
#>  [8,] 9.935083
#>  [9,] 6.514526
#> [10,] 9.625350

Optimizer

Now, lets do the golden search optimizer. Se the Wikipedia page to learn more about it.

optimizer <- function(f_s, g_s, x_dom, t, xtil_s) {
  phi <- 2 / (sqrt(5) + 1)

  a <- x_dom[[1]][1];  b <- x_dom[[1]][2]
  x1 <- b - phi * (b - a);  x2 <- a + phi * (b - a)
  fx1 <- f_s(x1);  fx2 <- f_s(x2)

  iter <- 0

  while (abs(b - a) > 1e-5 && iter < 1000) {
    if (fx1 < fx2) {
      b <- x2
      x2 <- x1; fx2 <- fx1
      x1 <- b - phi * (b - a)
      fx1 <- f_s(x1)
    } else {
      a <- x1
      x1 <- x2; fx1 <- fx2
      x2 <- a + phi * (b - a)
      fx2 <- f_s(x2)
    }
    iter <- iter + 1
  }

  x <- (a + b) / 2
  data.frame(x = I(matrix(x)), xtil = I(matrix(xtil_s)), f = f_s(x), i = NA)
}

Again, the arguments names are required. They denote that the optimizer will run once for every sample \(xtil_s\) in the current population \(S_t\). The functions f_s and g_s will receive values as below:

xtil_s <- 5
f_s <- \(x) f(x, xtil_s)
g_s <- \(x) g(x, xtil_s)

optimizer(f_s = f_s, g_s = g_s, x_dom = x_dom, t = 5, xtil_s = xtil_s)
#>              x xtil  f  i
#> 1 2.053031e-06    5 25 NA

Note the required data frame format. Also note how I(.) is used to create a matrix column. Did you knew that data frames could have matrices for columns? See this section of [Advanced R (2e)](https://adv-r.hadley.nz/vectors-chap.html?q=I(#matrix-and-data-frame-columns) for more information. This is a very neat trick that enables the very elegant use of the flow_* formulas evaluation.

Also note that one could create an optimizer that randomly chooses from a list of optimizers, use different ones at the same time, for sensitivity reasons.

Joining The Pieces

Lets join what we have. In simple terms, the algorithm initializes as below:

results <- vector("list", length(stopper$iter_upper))

results[[1]]$xtil <- initializer(xtil_dom, gtil)

Then, it enters a loop (emulated by t <- 1 for its first iteration), and creates the xtil_t (\(S_t\)) variable. Then, for each sample \(s\) we do a loop via map (again, see Advanced R), and save the 1-row data frames in a list. This list then is ‘rbinded’ via list_rbind.

t <- 1
xtil_t <- results[[t]]$xtil

results[[t]] <- list_rbind(map(seq_len(nrow(xtil_t)), function(s) {
  xtil_s <- xtil_t[s, ]
  f_s <- \(x) f(x, xtil_s)
  g_s <- \(x) g(x, xtil_s)
  optimizer(f_s = f_s, g_s = g_s, x_dom = x_dom, t = t, xtil_s = xtil_s)
}))

See how it turns out below:

str(results[[t]])
#> 'data.frame':    10 obs. of  4 variables:
#>  $ x   : 'AsIs' num [1:10, 1] 2.05e-06 2.05e-06 2.05e-06 2.05e-06 2.05e-06 ...
#>  $ xtil: 'AsIs' num [1:10, 1] 3.91 9.42 7.17 3.05 7.94 ...
#>  $ f   : num  15.29 88.69 51.42 9.31 63.05 ...
#>  $ i   : logi  NA NA NA NA NA NA ...

Note the matrices columns. In this case is hard to differentiate because they are 1-column matrices, but this will accommodate any dimension \(m\) and \(\tilde{m}\).

See the full results below:

results[[t]]
#>               x     xtil         f  i
#> 1  2.053031e-06 3.910044 15.288445 NA
#> 2  2.053031e-06 9.417402 88.687466 NA
#> 3  2.053031e-06 7.170930 51.422241 NA
#> 4  2.053031e-06 3.051131  9.309402 NA
#> 5  2.053031e-06 7.940594 63.053031 NA
#> 6  2.053031e-06 9.665750 93.426731 NA
#> 7  2.053031e-06 9.101614 82.839379 NA
#> 8  2.053031e-06 9.557790 91.351350 NA
#> 9  2.053031e-06 9.364267 87.689504 NA
#> 10 2.053031e-06 6.238507 38.918964 NA

Updater

Now lets move forward, and define the genetic algorithm approach to update the sample. We sort \(S_t\) based on the lowest values of \(f\) decreasing = FALSE, as we are looking for a minimum.

Then, we select pairs of values, with greater probability if they have lower values. They are used to create a “child”, which is “mutated” with random multiplicative noise. Finally, the constraint is checked and triggers a resampling if needed.

updater <- function(xtil_dom, gtil, r_t, t) {
  xtil_ordered <- r_t$xtil[order(r_t$f, decreasing = FALSE), 1]
  xtil_new <- matrix(NA, nrow = n_samples, ncol = 1)

  probs <- 1 / (1:n_samples)
  probs <- probs / sum(probs)

  for (s in seq_len(n_samples)) {
    draw <- NA
    while (is.na(draw)) {
      parents <- sample(n_samples, 2, prob = probs)
      draw <- (xtil_ordered[parents[1]] + xtil_ordered[parents[2]]) / 2
      draw <- draw * (1 + runif(1, -0.1, 0.1))
      draw <- ifelse(gtil(draw) >= 0, draw, NA)
    }
    xtil_new[s, 1] <- draw
  }

  xtil_new
}

Again, fixed argument names and matrix result. See the output below:

updater(xtil_dom = xtil_dom, gtil = gtil, r_t = results[[t]], t = t)
#>           [,1]
#>  [1,] 3.737857
#>  [2,] 4.986482
#>  [3,] 3.289730
#>  [4,] 4.930159
#>  [5,] 6.081697
#>  [6,] 3.290886
#>  [7,] 5.439877
#>  [8,] 6.143662
#>  [9,] 6.136750
#> [10,] 3.509403

The `optimize_phy` Aggregation

All of these ingredients are combined in the optimize_phy function, which also tests if:

The functions have the correct arguments.
The domains have the correct length.
The operators have the correct arguments and generate results in the correct format.
Catches errors in the operators calls to help debug your code.
Calculates metrics to check performance, log them in the console, and stop the algorithm if any reaches some desired value.

Metrics

Lets talk about flow_stopper and flow_logger. See the full help page for details, but flow_stopper recives a list such as:

list(
  max_iter = t ~ .m >= 20,
  max_time = time ~ .m >= 600,
  f_prop = (max(f[[t]]) - max(f[[t - l]])) / abs(max(f[[t - 1]])) ~ .m < 0.01
)

Each element is a formula, with an expression on the LHS that can access the results up until that point, ie:

The current iteration t or time.
The sequences of f (a \(t \times 1\) vector) for all iterations.
The sequence of x (a \(t \times m\) matrix) or x (a \(t \times \tilde{m}\) matrix).
The sequence of meta-information i, whichever format you made it to be.

That is, the user can define any metric that he wants. This expression will be evaluated in the current context of the algorithm, via non-standard evaluation (again, Advanced R, yes, that is my favorite R book, how did you knew?), using the columns of \(results_t\) as variables (hence the beauty of the matrix-columns).

Then, the RHS is an expression that takes the value of the metric in .m and returns a single boolean.

Finally, the check_expr is an expression that gets the vector of all the booleans (.ms) and combines them with some logical expression into a single boolean, TRUE if the algorithm should be stopped. This allows for stopping the algorithm only if a group of metrics is true.

flow_logger is very similar, but can also access the metrics calculated as variables (hence the names on the list above), and the RHS is a function that formats .m into a single string.

Lets set them as below. The only stopping criteria is reaching 20 iterations.

stopper <- flow_stopper(list(max_iter = t ~ .m >= 20))
logger <- flow_logger()

These objects are comprised of elements that will be used by the algorithm:

str(stopper)
#> List of 3
#>  $ iter_upper   : num 10000
#>  $ get_metrics  :function (results, t, time)  
#>  $ check_metrics:function (metrics_t, which = FALSE)  
#>  - attr(*, "class")= chr "flow_stopper"

str(logger)
#> List of 1
#>  $ log:function (results_t, t, time, metrics_t)  
#>  - attr(*, "class")= chr "flow_logger"

The Full Algorithm

Lets run all!

optima <- optimize_phy(
  f, g, gtil,
  x_dom, xtil_dom,
  initializer, optimizer, updater,
  stopper, logger,
  check_op = 0
)
#> → => Iter: 1 -- mean(f): 29.0643
#> → => Iter: 2 -- mean(f): 12.8244
#> → => Iter: 3 -- mean(f): 8.8746
#> → => Iter: 4 -- mean(f): 4.967
#> → => Iter: 5 -- mean(f): 4.611
#> → => Iter: 6 -- mean(f): 4.8232
#> → => Iter: 7 -- mean(f): 4.5581
#> → => Iter: 8 -- mean(f): 4.5334
#> → => Iter: 9 -- mean(f): 4.6398
#> → => Iter: 10 -- mean(f): 4.6648
#> → => Iter: 11 -- mean(f): 4.7027
#> → => Iter: 12 -- mean(f): 4.754
#> → => Iter: 13 -- mean(f): 4.8174
#> → => Iter: 14 -- mean(f): 4.7678
#> → => Iter: 15 -- mean(f): 5.0016
#> → => Iter: 16 -- mean(f): 4.7318
#> → => Iter: 17 -- mean(f): 4.9175
#> → => Iter: 18 -- mean(f): 4.6491
#> → => Iter: 19 -- mean(f): 4.8853
#> → => Iter: 20 -- mean(f): 4.8674
#> 
#> ✔ Finished after 20 iterations. Stopping criteria(s) were:
#> • max_iter: t ~ .m >= 20

str(optima, 1)
#> List of 4
#>  $ results     :List of 20
#>  $ metrics     :List of 20
#>  $ metrics_stop: Named chr "max_iter: t ~ .m >= 20"
#>   ..- attr(*, "names")= chr "max_iter"
#>  $ duration    :List of 4

We can get the duration, which stopping criterias were met, and all the \(R_t\) and \(M_t\), including the best (last):

optima$duration
#> $iters
#> [1] 20
#> 
#> $time_init
#> Time difference of 0.06554794 secs
#> 
#> $time_loop
#> Time difference of 1.097071 secs
#> 
#> $time_total
#> Time difference of 1.162633 secs

optima$metrics_stop
#>                 max_iter 
#> "max_iter: t ~ .m >= 20"

r_t <- optima$results[[20]]
r_t[which.max(r_t$f), ]
#>              x     xtil        f  i
#> 2 2.053031e-06 2.321075 5.387391 NA

We can see that we got very close to the global minimum \((0, 2)\). The efficacy is only due to the number of iters and the user code.