SUPEROPT: Supervision of Route Optimizers
Department of Aerospace Engineering, University of Bristol
Abstract
The goal of this project is to make numerical trajectory optimizers more suitable for human supervision. Research to date has identified that optimization offers a powerful and flexible tool for trajectorybased ATM operations, applied at a variety of scales of decision making, but rarely considering interaction with a human. This interaction is challenging to achieve, as optimizers typically involve complex search processes and are sensitive to their input parameters. However, the optimizer is based around an internal mathematical model of the system, which will inevitably fail to capture all aspects of behaviour, so human supervision is still most important. This research will seek avenues for input from the supervisor as well as more meaningful feedback from the optimizer about its results.
Introduction and Problem Statement
SESAR identifies a move to trajectorybased operations as the foundation of its future ATM concept:
“A trajectory representing the business/mission intentions of the Airspace Users and integrating ATM and airport constraints is elaborated and agreed for each flight”
It is natural to cast this as an optimization problem, with the “intentions” represented in the objective function, the constraints as mentioned, and the “elaboration” involving a numerical optimization process. Much research has been performed into trajectory optimization, finding that it offers good performance and flexibility, especially in complex scenarios extending beyond pairwise avoidance. SESAR also identifies that humans will be at the core of future ATM systems. This is essential, as the internal models on which automation is based are inevitably limited, compared to a human’s capability to judge the quality of a trajectory. The SESAR model of the life cycle of a trajectory includes stages of evolution involving multiple stakeholders. This interaction of trajectory optimization with human supervision forms the core question of this research:
How can human supervisors interact with trajectory optimization?
This is a question neglected by nearly all current research. From the standpoint of the algorithm designer, the human represents yet another source of uncertainty, and it is logical to leave them out of initial research. Most trajectory optimization research therefore looks at “full automation”, with no human participation in decision making.
Most optimization algorithms could conceivably be made compatible with all levels of Parasuraman’s scale of degrees of automation for supervisory control. Since this only captures a binary “accept/decline” or “request” input, human interaction can be incorporated by means of interfacing as outlined in Figure 1(A) without alteration to the algorithm. The human may initiate the optimization procedure and choose whether or not to adopt the solution resulting. However, a more flexible interaction involves the human influencing how the optimizer works, as outlined in Figure 1(B), not simply whether not its suggestion is implemented. This type of interaction requires modification to the optimizer itself, and it is these modifications that are the subject of this proposal.
Figure 1 (A) The operator controls whether or not the algorithm's decision is implemented. (B) The operator influences the decision of the algorithm.
Project Objectives and Expected Results
The project will investigate and compare a variety of ways for the supervisor to influence the optimizer’s decision making, aiming to balance solver responsiveness, overall performance and supervisor workload. The proposal identifies some initial ideas for mechanisms, including reference paths or corridors; specifying a priority order for objectives rather than weights; and grouping constraints into higherlevel behaviours or “plays”. Techniques from control engineering will also be leveraged to study the stability of humanoptimizer interaction, i.e. to see if they will converge to an agreed answer.
As well as mechanisms for input to the process, the project will investigate what additional output can be provided from the optimizer to help the supervisor make informed interventions. The trade here will be to provide good information on the rationale behind the optimizers suggestions without overloading the supervisor with information. Initial ideas include the provision of sensitivity information or of a ranked list of distinct alternatives.
The purpose of this research is not to develop new trajectory optimizers: these are well covered by existing work. Neither will it develop new human interfaces, which are also wellresearched, especially in ATM. The purpose is to find routes for information into and out of trajectory optimizers, identifying which optimizers and parameterizations are best suited to human supervision and interaction. This is a crucial step to enable future humancentred, highly automated, trajectorybased ATM systems.
Methodology (Ideas)
Constraint Selection and the Trajectory Playbook
Many constraint interactions are straightforward in formulation. An obvious example, wellstudied in the literature, is the creation of nofly zones. Manipulation of available speeds and turns are also easy to implement. The challenge for simple constraint modifications such as these is to manage the shear number of settings. Can they be cleverly grouped to reduce the number of parameters without overly diminishing the level of control of the supervisor?
In a trajectory optimization setting, it is interesting to separate “local” from “global” decision making, analogous to the ideas of local and global optimization. Robotics motion planning identifies the concept of classes of solution. Solutions A and B are in different classes if and only if B cannot be transformed into A by continuous changes in parameters with all intermediate paths feasible as well. For example, in 2D obstacle avoidance, if trajectory A passes the obstacle on the left and trajectory B on the right, A and B are in different classes. Hence “global” decision making refers to determining which class of solutions to consider and “local” refers to which trajectory within the class. The global decisions can also be thought of as the “sense” of resolution of conflicts: does A pass in front of or behind B? Does A arrive before or after B?
Permitting some global decisions to be fixed by the supervisor while leaving local ones to the optimizer offers the potential for useful interaction. This reduces supervisor workload (i.e. the number of parameters) but retains considerable influence over the optimizer behaviour. However, enforcing this constraint is nontrivial in terms of the formulation of the constraints. Research will be pursued to explore mathematical formulations and compatible optimizers for this approach.
The playbook idea of Miller et al offers an interesting possibility for constraint selection. Applied to robots in a Roboflag game, it reduces human interaction to the selection of behaviours or “plays” such as border defence or intruder interception. The existing ATM practice of separating traffic into altitude layers based on a coarse partition of flight direction could be viewed as a play. Figure 2 shows an example output from a MILP optimizer for a potential “roundabout right” play. What other plays can be encoded in constraints on an optimizer?
Figure 2 Example of a “Roundabout” solution in an optimized solution for three vehicles.
Reference Corridors
The concept of a corridor constraint is illustrated in Figure 3, taken from the PI’s recent work on autonomous routing for UAVs. The UAV, assumed here to fly in 2D, is constrained to remain within a reference corridor around a predefined reference path. The trajectory optimizer determines the exact path within the corridor to maximise separation from intruders.
Figure 3: UAV flight with 2D Corridor Constraint. Red square: intruder. Purple circle: UAV flying reference trajectory. Blue diamond: UAV flying optimized trajectory, still within the corridor.
Cooperative Optimization
All of the interactions described above imply the possibility of iteration between a supervisor and an optimizer. Furthermore, it is likely that any future system will involve different aircraft controlled by different agents, some human and others automated. Coupled through the constraint to maintain separation, this scenario also implies iteration between humans and optimizers, as one reacts to the changes in intent of the other. Lyapunov theory, adopted from cooperative MPC, provides a powerful tool for analysing the stability of such a system. This could also be viewed as a problem in game theory, with two agents “playing” with slightly different worldviews. In the ATM case, the agents are not competing in reality, but distributed Model Predictive Control work has found that greedy decision making can still arise.
Figure 5 and Figure 6 below, taken from initial simulations, show two examples where cooperation solves problems cause by greediness. The cooperation involves optimizing not just for the trajectory of the local trajectory but for “candidate” trajectories for neighbours. This gives the local optimizer an incentive to “leave an option” for a neighbour that results in a global improvement. The scenario in Figure 7 was inspired by an example from existing ATM research that was found to be a breaking case for pairwise conflict resolution.
GREEDY 



COOPERATIVE 



Figure 4: Comparison of greedy and cooperative route optimization. 
Figure 5: Comparisons of greedy and cooperative route optimization. 
Soft Constraints and Prioritization
A complication in manipulating constraints is knowing how far to move them. Restricting the decision space too far leads to an infeasible problem. The infeasibility issue can also be addressed by the introduction of soft constraints, in which constraints can be violated but at a penalty to the objective function. Hence the optimizer “prefers” to return a solution respecting the original constraints but, if no such solution exists, returns an answer with minimum violation, in some sense. Most usefully, the concept of exact penalty functions has been developed, determining minimum violation penalty weights such that constraints will be violated if and only if no feasible solution exists.
For human interaction, soft constraints introduce the problem of weighting: once over the minimum threshold for exact penalty, it is hard to know how to set the violation weight. This project will explore the concept of constraint prioritization, in which the user sets only the order of a list of constraints, not the weights themselves. The optimization would be configured such that constraint j will only be violated if no solution exists satisfying all of constraints 1 to j. The utility of this approach is highly dependent on selection of the constraints made available. If the list is too long, it will become impractical for the supervisor to manage effectively.
Gradients, Active Constraints and Sensitivity
To make the most of that capability, the supervisor must have good information on what the optimizer is doing. This idea was captured as part of the humanautomation “etiquette” defined by Miller: “Talk about what you are doing and why.” It is straightforward to extract the full 4D trajectory from an optimizer result and present that to an operator. This research will investigate the provision of additional information that can indicate the rationale behind the decision making.
One easy step is to identify which constraints are active in the solution. This informs the operator which constraints matter in the solution and which do not dominate the current result. For example, if the “corridor +/2nm” constraint is identified as inactive, the operator knows that relaxing it would make no difference to the result.
A natural extension of this is to provide sensitivity analysis: by how much will the solution change as the parameters are changed. For many optimizers, this information is immediately available from the solution process, via Lagrange or Karush Kuhn Tucker multipliers. These are analytically linked to the local sensitivity, informing the user which parameters have the greatest effect on the solution. The research will investigate ways of processing and presenting this information for the operator. Global sensitivity information indicates how far the parameters can be changed with the local sensitivity still valid. For example, it can indicate how far an inactive constraint can be altered before it becomes active and influences the solution. This information can save considerable interaction time making gentle adjustments and resolving, only to find the result unchanged. Again, processing and presentation are important topics for study: the sensitivity must be expressed in terms of whatever inputs the supervisor has available.
Solution Rankings
It could be informative to provide “the best of the bunch” from the solutions considered. The subject of the research would be the generation of this list. This history information offers rationale in the form of a statement “I’m suggesting X because it’s better than Y and Z.” However, it also provides another opportunity for supervisor interaction in the framework of Level 3 of the scale of automation, at which the supervisor chooses from alternatives. Again, the list generation is the key, and few trajectory optimizers have been investigated for their ability to return anything other than a point result. Ideas from robust design and multidisciplinary optimization could be exploited here: in those fields, optimizers are used to provide solution families instead of single answers.
The challenge here is that many results will be similar, as the solver refines a solution with small local steps. For example: a stochastic search could converge on a local minimum after several small steps and then perform a random “jump” before converging on a new, better local minimum. Finding the best of each “cluster” of solutions is a problem similar to mathematical peak detection, and this analogy will be leveraged in the research.
In some cases, the nature of the optimizer can be exploited to identify distinct solutions. In the solution process for a branchandbound optimization for obstacle avoidance, the solver explores a tree of solutions, trying on different sides of obstacles as the trajectory encounters them. The selection of feasible “leaf” nodes provides a list of distinct options.