AutoSAC: automatic scaling and admission control of forwarding graphs

There is a strong industrial drive to use cloud computing technologies and concepts for providing timing sensitive services in the networking domain since it would provide the means to share the physical resources among multiple users and thus increase the elasticity and reduce the costs. In this work, we develop a mathematical model for user-stateless virtual network functions forming a forwarding graph. The model captures uncertainties of the performance of these virtual resources as well as the time-overhead needed to instantiate them. The model is used to derive a service controller for horizontal scaling of the virtual resources as well as an admission controller that guarantees that packets exiting the forwarding graph meet their end-to-end deadline. The Automatic Service and Admission Controller (AutoSAC) developed in this work uses feedback and feedforward making it robust against uncertainties of the underlying infrastructure. Also, it has a fast reaction time to changes in the input.


Introduction
Over the last years, cloud computing has swiftly transformed the IT infrastructure landscape, leading to large costsavings for deployment of a wide range of IT applications. Physical resources such as compute nodes, storage nodes, and network fabrics are shared among tenants through the use of virtual resources. This makes it possible to dynamically change the amount of resources allocated to a tenant, for example as a function of workload or cost. Initially, the cloud technology was mostly used for IT applications, e.g., web servers, databases, etc., but has now found its way into new domains. One of these domains is packets processed by a chain of network functions.
In this work, we are considering a chain of network functions through which packets are flowing. Every packet must be processed by each function in the chain within some specific end-to-end deadline. The goal is to ensure that as many packets as possible meet their deadline, while at the same time using as few resources as possible.
The goal is thus to derive a method for controlling the amount of resources allocated to each network function in the chain. Previously, this was usually done by statically allocating some amount of resources to each network function. Since the input is time-varying (see Fig. 1 for a trace of traffic flowing through a switch in the Swedish university network, SUNET), such a strategy usually lead to overallocation of resources for long periods of time (yielding high costs and environmental footprint) as well as overload for shorter periods, when the input is large. To ensure that at least some packets meet their deadlines when the network function is overloaded, one has to use admission control, i.e., reject some packets.
Recently, a new option became available through the advances of virtualization technology for networking services. The standardization body ETSI (European Telecommunications Standards Institute) addresses the standardization of these virtual network services under the name Network Functions Virtualization (NFV) [1]. These Virtual Network Functions (VNFs) consist of virtual resources, such as virtual machines (VMs), containers, or even processes running in the OS. Using such VNFs, it is possible to change the resources allocated to a network function by either vertical scaling (i.e., changing the capacity of the allocated VMs) or horizontal scaling (i.e., changing the number of allocated VMs). Horizontal scaling is considered in this work. These VNFs are connected in a graph topology (commonly called a Forwarding Graph), as illustrated in Fig. 2. In this figure, there are two forwarding graphs (corresponding to the blue and red arrows). The blue forwarding graph consists of VNF 1 , VNF 2 , VNF 3 , and VNF 5 and the red forwarding graph consists of VNF 1 , VNF 2 , VNF 4 , and VNF 5 . Each of the VNF is given a number m i ∈ Z + of VMs, which are mapped onto the network function virtual infrastructure.
While the benefit of using NFV technologies is scalability and resource sharing, there are two drawbacks as follows: a) Starting a new virtual resource takes time, since it has to be deployed to a physical server and it requires the execution of several initialization scripts and push/pulls before it is ready to serve packets, b) The true performance of the virtual resource differs from the expected performance, since one does not know what else is running on the physical machines [2].
In this work, we -develop a model of a service-chain of network functions and use it to derive a service-controller and admissioncontroller for the network functions, -derive a service-controller controlling the number of virtual resources (e.g., VMs or containers) allocated to each network function by using feedback from the true performance of the instances as well as feedforward between the network functions, -derive an admission-controller that is aware of the actions of the service-controller which it uses in order to reject as few packets as possible, -evaluate the service and admission controller using a real-world traffic trace from the Swedish University Network (SUNET).

Related works
There are a number of works considering the problem of controlling virtual resources within data centers, and specifically for virtual network functions. However, many of them focus on orchestration, i.e., how the virtual resources should be mapped onto the physical hardware. Shen et al. [3] develop a management framework, vConductor, for realizing end-to-end virtual network services. In [4], Moens and De Turk develop a formal model for resource allocation of virtual network functions. A slightly different approach is taken by Mehraghdam et al. [5] where they define a model for formalizing the chaining of forwarding graphs using a context-free language. They solve the mapping of Scaling of virtual network functions is however studied by Mao et al. [6] where they develop a mechanism for auto-scaling resources in order to meet some user specified performance goal. Recently, Wang et al. [7] developed a fast online algorithm for scaling and provisioning VNFs in a data center. However, they are not considering timingsensitive applications with deadlines for the packets moving through the chain, which is done by Li et al. [8] where they present a design and implementation of NFV-RT that aims at controlling NFVs with soft real-time guarantees, allowing packets to have deadlines.
The enforcement of an end-to-end deadline for a sequence of jobs is however addressed by several works, possibly under different terminologies. Di Natale and Stankovic [9] propose to split the E2E deadline proportionally to the local computation time or to divide equally the slack time. Later, Jiang [10] used time slices to decouple the schedulability analysis of each node, reducing the complexity of the analysis. Such an approach improves the robustness of the schedule, and allows to analyze each pipeline in isolation. Serreli et al. [11,12] proposed to assign local deadlines to minimize a linear upper bound of the resulting local demand bound functions. More recently, Hong et al. [13] formulated the local deadline assignment problem as a MILP with the goal of maximizing the slack time.
An alternate analysis was proposed by Jayachandran and Abdelzaher [14], who developed several transformations to reduce the analysis of a distributed system to the single processor case. Or in [15] where Henriksson et al. proposed a feedforward/feedback controller to adjust the processing speed to match a given delay target.

Modeling the service-chain
In this section, we present a general model of the forwarding graph and virtual network functions presented in Section 1. We consider a service-chain consisting of n functions F 1 , . . . , F n , as illustrated in Fig. 3. Packets are flowing through the service-chain and they must be processed by each function in the chain within some end-to-end deadline. A fluid model is used to approximate the packet flow and at time t there are r i (t) ∈ R + packets per second (pps) entering the ith function. In a recent benchmarking study, it was shown that a typical virtual machine can process Fig. 3 Illustration of the service-chain around 0.1-2.8 million packets per second, [16]. Hence, in this work, the number of packets flowing through the functions is assumed to be in the order of millions of packets per second, supporting the use of a fluid model.
A function consists of several parts, as illustrated in Fig. 4: an admission controller, a service controller, m i (t) instances, a buffer, and a load balancer. It is assumed that all the parts of a function are located at the same location, e.g., the same data center or rack. In [17], Google showed that less than 1 μs of the latency in a data center was due to the propagation in the network fabric. Hence, communication delay within a function is neglected.

Admission controller
Every packet that enters the service-chain must be processed by all of the functions in the chain within a certain endto-end (E2E) deadline, denoted D max . This deadline can be split into local deadlines D i (t), one for each function in the chain, such that the packet should not spend more than D i (t) time-units in the ith function. Should a packet miss its E2E deadline, it is considered useless. It is thus favorable to use admission control to drop packets that have a high probability of missing their deadline in order to make room for following packets. The goal of the admission controller is to guarantee that the packets that make it through the service-chain do meet their E2E deadline. It is assumed to be possible to do admission control at the entry of every function in the chain.
Packets are admitted into the buffer of function F i based on the admittance flag α i (t) ∈ {0, 1}. If α i (t) = 1 incoming packets are admitted into the buffer, and if α i (t) = 0 they are rejected. We define the residual rate ρ i (t) to be the rate by which packets are admitted into the buffer: (1)

Service controller
At any time instance, function F i has m i (t) ∈ Z + instances up and running. Each instance is capable of processing packets and corresponds to a virtual machine, a container, or a process running in the OS. It is possible to control the number of running instances by sending a reference signal m ref i (t) ∈ Z + to the service controller. However, as explained in Section 1, it takes some time to start/stop instances since an instantiation of the service is always needed. We denote this as the time overhead i . Hence, the number of instances running in the i'th function at time t is The time-overhead is assumed to be symmetric here, but in the real-world it is usually faster to start an instance than it is to stop one. However, for increased readability they are considered equal in this work. It should be noted that it is straight forward to extend the theory to account for an asymmetric time-overhead. An instance is expected to be able to process packets at an expected service rate ofs i pps. However, as described in Section 1, the true capacity of the instance will differ from the expected one since there might be other loads running on the infrastructure (i.e., the physical machine). Hence, the true capacity of the j th instance in the ith function is given by is the machine uncertainty for the j th instance in the ith function. It is given by where ξ lb i and ξ ub i are lower and upper bounds of this machine uncertainty, assumed to be known. The machine uncertainty is also assumed to be fairly constant during the lifetime of the instance. Using this, one can express the true capacity of the ith function in the service-chain as which together with the average machine uncertaintŷ can be written as s cap . Note that it would be natural to allow the time-overhead i to also have some uncertainty. However, such uncertainty can be translated into a machine uncertainty.

Processing of packets
The packets in the buffer are stored and processed in a FIFO manner. Once a packet reaches the head of the queue the load balancer will distribute it to one of the instances in the function. Note that this is done continuously due to the fluid approximation. The rate by which the load balancer is distributing packets, and thus by which the function is processing packets, is defined as the service rate where ρ i (t) is residual rate given by Eq. 1 and q i (t) is the number of packets in the buffer: where P i (t) = t 0 ρ i (x)dx is the total amount of packets that has been admitted into function F i , and S i (t) = t 0 s i (x)dx is the total amount of packets that has been served by function F i . Furthermore, the total amount of packets that has reached the ith function is given by

Function delay
The time that a packet that exits function F i at time t has spent inside that function is denoted the function delay d i (t): The expected time that a packet entering the ith function at time t will spend in the function before exiting is defined as the expected function-delayd i (t) Equation 8 can be interpreted as finding the minimum time τ ≥ 0 such that S i (t + τ ) = P i (t), or in other words such that at time t + τ the function will have processed all the packets that have entered the function at time t.
Computing the expected function-delayd i (t) requires information about m i (t) andξ i (t) for the future, whereas computing the expected function delay d ub It is therefore possible to compute the expected function delayd i (t) whenever it is shorter than the time-overhead i (which will be used later in Section 3 when deriving the admission controller and the service controller).
Note that the (expected) function delay does not distinguish between queueing delay and processing delay. In [17], Google profiled where the latency in a data center occurred and showed that 99% of the latency (≈85 μs) occurred somewhere in the kernel, the switches, the memory, or the application. It is very difficult to say exactly which of this 99% is due to processing or queueing, hence they are considered together as the function delay.

Concatenation of functions
The n functions in the service-chain are concatenated with the assumption of no loss of packets in the communication channel between them. Therefore, the input of function F i is exactly the output of function F i−1 : Finally, no communication latency between the functions is assumed. However, it is possible to account for it, and would be necessary should the different functions reside in different locations, i.e. different data centers. However, adding a communication latency is straightforward, and if such a communication latency (say C) were to be constant between the functions one could easily account for it by properly decrementing the end-to-end deadline:D max = D max − C, and then use the framework developed in this paper.

Problem formulation
The goal of this paper is to derive a service-controller and an admission-controller that guarantees that packets that pass through the service-chain meet their E2E deadline. This should be done using as few resources as possible while still achieving as high throughput as possible. This is captured in a simple, yet intuitive utility function u i (t). Later in Section 3, the utility function is used to derive an automatic service-and admission controller, denoted AutoSAC.

Utility function
The utility function measures the availability a i (t) and the efficiency e i (t) of each function in the service chain. The availability is defined as the ratio between the service-rate and the input-rate of the function, and the efficiency is defined as the ratio between service-rate and the capacity of the function: The reason why a i (t) can grow greater than 1 is due to the buffer-it is possible to store packets for a short interval and then process them at a rate greater than what they arrived with. However, it is not possible to have a i (t) > 1 for an infinite amount of time. In practice, is very small, and it is not possible to achieve a a i (t) > 1 for any significant period of time. A low availability corresponds to a large percentage of the incoming load being rejected by the admission controller, since there is not enough capacity to serve them. A low efficiency, on the other hand, corresponds to overprovisioning of resources. It is therefore difficult to achieve both high availability and high efficiency. The availability and efficiency is combined into a utility function u i (t): . (11) Note that the utility function as well as the availability and efficiency function have the good property of being normalized making it easy to compare the performance of service-chains having different input load. To evaluate the performance between service-chains of different lengths and over different time-horizons the average utility U(t) is defined: While the utility function (11) uses the product of the availability and efficiency one might argue that they should not have equal weight when computing the utility. A natural choice to achieve that would be to have a convex combination of them: where λ i corresponding to the relative importance of achieving a high availability or a high efficiency. The method used in Section 3 to derive a control-strategy using utility function (11) will also apply to the alternative utility function (13).

Controller design
In this section, an automatic service-and admissioncontroller (AutoSAC) is derived. Figure 5 illustrates an overview of the different parts of AutoSAC and the information flow it uses. It measures the incoming load, current queue size, and the true performance in order to estimate how much service rate it will need as well as to estimate how long it will take an incoming packet to pass through the function. It also uses feedforward to functions down the chain in order to make them react faster to changes in the input load. For instance, when the ith function increases its service rate, it sends a signal to the i + 1th function letting it know that in i time-units, it will get an increase in incoming traffic rate. Finally, due to the time overhead needed to start new instances there will be a need to do admission control, however, in order to not discard unnecessarily many packets it uses feedback from the queue size and the true performance of the functions to estimate how much time it will take a new packet to pass through the function, then it does the admission control based on this estimate. The difficulty when deriving AutoSAC lies in the different time-scales for starting/stopping instances, the E2E deadlines, and the rate-of-change of the input. They are all assumed to be of different orders of magnitudes, given by Table 1. However, these timing assumptions will be exploited when deriving AutoSAC later.
The admission controller is derived in Section 3.1 and the service controller in Section 3.3. In Section 3.4, a short discussion of the properties of AutoSAC is presented.

Admission controller
Every request that enters the service chain has an end-toend deadline D max . It has to pass through every function in the chain within this time. Furthermore, each function can impose a local deadline D i (t) for the packet entering the ith function at time t. One can therefore use either the local deadline to do a decentralized admission control at the entry of each of the functions in the chain, or the global deadline for a centralized admission control. In this work, we will use a decentralized approach, shown below, but will also derive a policy for a centralized admission control in Section 3.2.1; however, only the decentralized policy will be evaluated in Section 4. Table 1 Timing assumptions for the end-to-end deadline, the changeof-rate of the input, and the overhead for changing the service-rate. These timing assumptions are used when deriving the automatic service-and admission-controller

Parameter
Timing assumption Long-term trend change of the input 1 min-1 h Service-rate change overhead i 1 s-1 min Request end-to-end deadline D max 1 μs-100 ms

Decentralized admission control
For the decentralized admission control, each function can compare the local deadline with the upper bound of the expected delay d ub i (t) it will take a new packet to pass through the function. If the worst-case expected delay is larger than the local deadline the admission controller should drop the packet. This results in the following policy for the admittance-flag α i (t): where the upper bound on the expected function delay d ub i (t) is given by This is the worst case of the expected delay given by Eq. 8, i.e., when every instance is processing packets at the lower bound of its possible service-rate, hence leading to the upper bound on the expected delay. One should note here that in order to compute the upper bound (15) . This is illustrated in Fig. 6 where P i (t) shows the cumulative amount of packets that has been let into the function, and S i (t) the cumulative amount of served packets up until time t. From time t until t + i , it shows a shaded blue region, highlighting that the exact service is uncertain in this area. However, it is possible to compute an

Centralized admission control
In contrast to the decentralized admission control, it might be advantageous to drop packets as soon as possible (in order to not waste any resources on packets that are dropped later) in the service chain if there is a possibility that they will miss their global deadline. To do so, one has to compare the expected worst-case end-to-end delay D ub (t) for a packet entering the chain at time t with the global deadline D max (t), leading to the following policy: Computing D ub (t) in Eq. 16 is straightforward, but before doing so, one has to compute the worst-case service rates for all the functions down the chain. At any time x ≥ t (with t being the current time) the worst-case predicted service-rate for functions i = 1, 2, . . . , n is: where s lb 0 (x) = 0, since we cannot predict the future inputrate of the first function. With t being the current time, the worst-case predicted cumulative-service of function i at time x ≥ t is then given by: Using this, the expected worst-case end-to-end delay D ub (t) is given by where P 1 (t) = t 0 ρ 1 (x)dx is the cumulative amount of requests that has been admitted into the first function. One should note that S lb n (x) in Eq. 19 could be expressed in a very neat way using Network Calculus [18,19], but due to lack of space we decided to not introduce the theory of Network Calculus in this paper.

Service controller
The goal for the service-controller is to find m ref i (t) such that the utility function is maximized once the reference signal is realized in i time-units, i.e., such that u i (t + i ) is maximized. In this section, it will be assumed that the utility function used is the one defined in Eq. 11; later in Section 3.3.1, it will be derived for the alternative utility function (13). Recall that the utility function (11) is given by .
As explained in the introduction of this section, the input load is assumed to change relatively slowly over a time interval of a few milliseconds. Hence, one can approximate since the goal of both the admission controller and the service controller is to keep d i (t) in the order of milliseconds or less. Therefore, it is possible to approximate the utility function with Furthermore, the service rate s i (t) can be approximated to be either at the capacity of the function, s cap i (t), or at the input rate r i (t) where the min is used since the function cannot process packets at a faster rate than what they are entering the function for a prolonged period of time. Likewise, it cannot process packets at a rate higher than the capacity of the function when the input were to be higher than this. This leads to the utility function being approximated as With s cap i (t) given by Eq. 3 and the average machine uncertaintyξ i (t) given by Eq. 4 the utility function can finally be approximated as Since the goal is to find m ref i (t) in order to maximize u i (t + i ), one needs knowledge ofξ i (t + i ) and r i (t + i ) which is not available. However, one can assume that the machine uncertainty will be fairly constant during i timeunits such thatξ i (t + i ) ≈ξ i (t). Furthermore, one has to estimate the future input-rate to the function. For the first function, F 1 , this can be done by using the derivative of the (preferably low-pass filtered) input-rate: For the succeeding functions, i = 2, . . . , n, the input-rate will change in a step-wise fashion and can therefore not approximate it with the expression above. However, since r i (t) = s i−1 (t) and m i−1 (x) is known for x ∈ [0, t + i−1 ] (with t being the current time), one could estimate the future input-rater i (t) witĥ Note that s The reason is that if i > i−1 one does not have enough information to compute s cap i−1 (t + i−1 ). However, one can use the assumption that i ≈ i−1 . Furthermore, since one can summarize the predicted inputr i (t) aŝ With this, one can define κ i (t) ∈ R + to be the real number of instances needed to exactly match the predicted incoming rate: The control signal, i.e., the number of instances that should be started, m ref i (t) can then be found by solving where x ∈ Z + is the number of instances and κ i (t) given by Eq. 24. Here, one can see that the first case of the above equation is maximized when x is as large as possible, but since this case is only valid when x ≤ κ i (t) it leads to x = κ i (t) . Similarly, the second case is maximized when x is as small as possible, and since this case is valid for x ≥ κ i (t) it leads to x = κ i (t) , leading to the final control-law: where again κ i (t) =r i (t) is the real number of machines that is necessary to exactly match the predicted incoming traffic.

Alternative utility function
Using the same method described in Section 3.3, one can derive a control-law for the alternative utility function (13): By using the approximation (20) for the input rate, (21) for the service rate, (23) for predicting the input rate, and finally (3) for estimating the maximum capacity along with Eq. 4 for the machine uncertainty, one arrives at the following control-law: where κ i (t) =r i (t) . One can see that the upper case is maximized when x is as large as possible within that case, i.e., with x = κ i (t) , while the lower case is maximized when x is as small as possible, i.e., with x = κ i (t) . The remaining question is then which of the two cases that yield the largest utility.
Fortunately, this it is easy to evaluate, resulting in the final control-law for the alternative utility function: where again κ i (t) =r i (t) . Comparing the two controllaws (25) and Eq. 26, one can see that the alternative control-law (26) is equivalent to the Eq. 25 when the efficiency and availability are considered equally important, i.e., when λ i = 1/2.

Properties of AutoSAC
There are several interesting properties captured by the admission controller and service controller presented in this section. First of all, the admission controller (14) ensures, by design, that every packet that is admitted into a function, and thus exits the function, meets its deadline. Therefore, no packets that exit the service-chain will miss their end-to-end deadline.
The service-controller given by Eq. 25 captures both the feedback used from the true performance of the instances (when computingξ i (t)) as well as feedforward information about future input coming from functions earlier in the service-chain (when computingr i (t)). This makes it robust against machine uncertainties but also ensures that it reacts fast to sudden changes in the input. For instance, given a service-chain of six functions, function F 5 will know that in 4 time-units, F 4 will have m ref 4 (t) instances running and can thus start as many instances as needed to process this new load.

Evaluation
In this section, the automatic service-and admissioncontroller (AutoSAC) developed in Section 3 is evaluated. First, in Section 4.1, by illustrating how a randomly generated service chain of three functions performs when it is given a 5-h traffic trace. Later, in Section 4.2, AutoSAC is compared with two other "state-of-the-art" methods for scaling cloud services. The comparison is done using a Monte Carlo simulation where the parameters of a five function service chain are randomly generated and then simulated, again using a real traffic trace as input.
The real-world trace of traffic data used as input was gathered over 120 hours from a port in the Swedish University NETwork (SUNET) and then normalized to have a peak of 10,000,000 packets per second as shown in Fig. 1. The simulation was written in the open-source language Julia [20]. The code and traffic trace used for this simulation is provided on GitHub. 1

Example chain
For this example, a service chain with three functions where the E2E deadline was set to 30 ms, which in turn was split into local deadlines of 10 ms for each function. The other parameters (i.e.,s i , i , ξ lb i , and ξ ub i ) for every function in the service-chain are generated randomly. The expected service-rates i was chosen uniformly at random from the interval [100,000, 200,000] pps. The time-overhead i was drawn uniformly at random from the interval [30, 120] seconds. The machine uncertainty was chosen to be in the range of ±30% of the expected service-rates i . The lower bound of the machine uncertainty was drawn from the interval [−0.3s i , 0] pps and likewise, the upper bound was drawn from [0, 0.3s i ] pps.
In Fig. 7, one can see how the service chain scales the number of instances up/down in order to react to the input load. In Fig. 8, one can see how the average utility changes over the course of the simulation. One thing to notice is that the average utility over the service chain remains stable above 0.95 despite large variations in the input.

Comparing AutoSAC with state-of-the-art
In this section, we will evaluate AutoSAC through a Monte Carlo simulation with 15 · 10 4 runs where it is compared against two state-of-the-art methods for auto-scaling VMs in industry; dynamic auto-scaling (DAS) and dynamic overprovisioning (DOP). However, since these two methods do not use any admission control, they are also augmented with the admission controller presented in Section 3.1. The two augmented methods are denoted by "DAS with AC" and "DOP with AC." Hence, in total, the method presented in Section 3 is compared with four other methods.

Dynamic auto-scaling (DAS)
This method is currently being offered to customers using Amazon Web Services [21]. It allows the user to monitor different metrics (e.g., Fig. 7 Simulation of a service chain with three functions where the parameters of each function were randomly generated. One can see how each function reacts to changes of the input rate and automatically scales the number of instances of each function up and down. Feedforward between the functions in the chain ensures a fast reaction since they can scale up/down before the changes occur in their input CPU utilization) of their VMs using CloudWatch. One can then use it together with their auto-scaling solution to achieve dynamic auto-scaling. This allows the user to scale the number of VMs as a function of these metrics. One should note that the CPU utilization can be considered the same as the efficiency metric e i (t) defined in Eq. 10. For the Monte Carlo simulation, the following rules were used: -add a VM if the efficiency is above 99%, -remove a VM if the efficiency is below 95%, which might seem as a high and tight interval, but it is necessary in order to achieve a high utility.

Dynamic over-provisioning (DOP)
A downside with DAS is that it reacts slowly to sudden changes in the input. A natural alternative would therefore be to instead do dynamic over-provisioning, where one measures the input to each function and allocate virtual resources such that there is an expected over-provision by 10%.

Monte Carlo simulation
The five methods are compared using a Monte Carlo simulation with 15·10 4 runs. For every run, 1 h of input data was randomly selected from the total of 120 h shown in Fig. 1. Furthermore, in every run, a new service-chain with five functions was generated using the method described in Section 4.1. The end-to-end deadline was chosen to 50 ms, which in turn was split into local deadlines of 10 ms for each function.
The evaluation of the Monte Carlo simulation is based on the average utility Since a packet that misses its deadline (which is possible when using DAS or DOP) is considered useless, it is evaluated as a dropped packet when exiting the function. It therefore impacts the availability metric and in turn the utility. Should all packets miss their deadlines in function F i for a time interval τ , then a i (t) = 0 ∀t ∈ τ , i.e., the availability would be evaluated as 0 during this time-interval since the output of the function is considered useless.

Results
The mean of the average utility U(t) for all the simulation runs is presented in Fig. 9 for each of the five methods. One can see that AutoSAC achieves a utility that is 30-40% better than that of DAS and DOP. The main reason for this is that they are lacking admission control leading to packets missing their deadlines, which eventually results in a low utility.
When augmenting DAS and DOP with the admission controller derived in Section 3.1, the performance is Fig. 8 Average utility for the entire service chain simulated in Section 4. The input is the same as for Fig. 7. One can see that the average utility remains above 0.95 throughout the simulation with small drops when there are large changes in the input rate. However, due to the feedback and feedforward properties, AutoSAC is able to quickly react to these changes and quickly recover a high utility Fig. 9 Results from the Monte Carlo simulation. AutoSAC performs 30-40% better than DAS and DOP. The main reason is the admission controller used in AutoSAC. When augmenting DAS and DOP with this admission controller, their performance is increased by more than 20%. However, AutoSAC still outperforms the augmented methods by 5-10% since it uses feedforward, making it faster to react to input changes, as well as feedback making it more robust to machine uncertainties increased by 20-40%, purely as a result of not having these sudden drops in performance. However, AutoSAC still performs 5-10% better, due to the feedforward property of AutoSAC which gives it a faster reaction time to changes in the input as well as the feedback property leading to better prediction and robustness against the machine uncertainties.

Summary
In this work, we have developed a mathematical model for a NFV Forwarding Graphs residing in a Cloud environment. The model captures, among other things, the time needed to start/stop virtual resources (e.g., virtual machines or containers), and the uncertainty of the performance of the virtual resources which can deviate from the expected performance due to other tenants running loads on the physical infrastructure. The packets that flow through the forwarding graph must be processed by each of the virtual network functions (VNFs) within some end-to-end deadline.
A utility function is defined to evaluate performance between different methods for controlling NFV Forwarding Graphs. The utility function is also used to derive an automatic service-and admission-controller (AutoSAC) in Section 3. It ensures that packets that exit the forwarding graph meet their end-to-end deadline. The service-controller uses feedback from the actual performance of the virtual resources making it robust against uncertainties and deviations from the expected performance. Furthermore, it uses feedforward between the VNFs making it fast to react to changes in the input load.
In Section 4, AutoSAC is evaluated and compared against four other methods in a Monte Carlo simulation with 15·10 4 runs. The input load for the simulation is a real-world trace of traffic data gathered over 120 h. The traffic is normalized to have a peak of 10,000,000 packets per second. AutoSAC is shown to have better performance than what is offered in the cloud industry today. We also show that when augmenting the industry-methods with the admission controller derived in Section 3, they have a significant increase in performance.
It would be interesting to extend this work by investigating how to derive a controller when the true performance is unknown or when the time-overhead needed to start virtual resources is unknown. Moreover, it would be interesting to investigate how to control a forwarding graph that has forks and joins, i.e., a graph structure rather than just a chain.