Commit 2c225d38 authored by msurl's avatar msurl
Browse files

update

parent 6ba2a45e
\section{Abstract}\raggedbottom
Maximizing photosynthetic outcomes is one of many different objectives of a plant. In this thesis we present/ examine a method to predict an optimal veneation pattern for leafs based on the minimal number of leaf cells that have to be transformed into vein cells to supply the entire leaf with nutrients and water. The model only focusses on the number of cells and disregards other aspects of the vascular system, like the vein hierarchy. To implement this model we used a special variant of the Minimum Dominating Set Problem which we implemented using Integer Linear Programming. We call this variant to model the vascular system the Minimum Connected rooted $k$-hop Dominating Set Problem. Our results show that our implementation is not capable of solving larger instances in a reasonable amount of time. In comparison to an implementation in Answer Set Programming our implementation performs worse using the instances that represent plant leafs. We present a detailled comparison between both versions and tested instances of different structure and size. We analyzed why the Integer Linear Programming implementation performes bad on the leaf graphs. The tests also revealed that on randomly generated graphs the Integer Linear Programming implementation outperformed the Answert Set Programming implemantion.
\pagebreak
......@@ -52,7 +52,8 @@
% werden soll, dann benutzen Sie die folgende Zeile mit
% englisch fuer englische Sprache
% deutsch fuer deutsche Sprache
\newcommand{\sprache}{deutsch}
%\newcommand{\sprache}{deutsch}
\newcommand{\sprache}{englisch}
% Hier wird eingestellt, ob es sich bei der Arbeit um eine Bachelor-
% oder Masterarbeit handelt (unpassendes auskommentieren!):
......@@ -82,9 +83,10 @@
\input{introduction}
\input{preliminaries}
\input{methods}
%\input{definitions}
%\input{ilp}
%\input{implementation}
\input{implementation}
\input{results}
\input{discussion}
\input{conclusion}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%% ENDE TEXTTEIL %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
\section{Conclusion}\raggedbottom
Given the fact that we adopted the model from \citep{myky} and only implemented it in another framework the models shortcomings are still present. It disregards different aspects that play a role in the venation pattern for real plants.
Additionally our implementation, in its current version, is not capable of generating optimal solutions in a reasonable amount of time for the leaf representing graphs. The ASP implementation performs better on these graphs and therefore is the better choice for to implement the model. Even after different approaches to reduce the runtime were evaluated the ASP implementation performed better. Nevertheless there are still approaches that can be evaluated.
The next step for the ILP implementation should be to invent a symmetry breaker that reduces the number of symmetrical unconnected integer solutions that are determined in the iteration process. Additionaly it should be evaluated which type of constraints can be further preadded that would otherwise be added anyway in the process. Another important point is to find heuristics that allow to determine suffiecient lower bounds faster.
Though it is also reasonable to implement the suggestions from \citet{myky} to further improve the ASP implementation as it outperformed the ILP implementation.
\pagebreak
......@@ -7,7 +7,7 @@
\begin{definition}[Neighborhood]
Given an undirected Graph $G = (V,E)$. Let $N(v)$ denote the neighborhood of a vertex $v$. $N(v)$ can formally be described as follows: \[w \in N(v) \Leftrightarrow \exists (v,w) \in E\]
\end{definition}
(Maybe leaf base case out as the s.t. part ist different from k-hop version. So after introducing in the implementation part it would be replaced.)
(Maybe leaf base case out as the s.t. part is different from k-hop version. So after introducing in the implementation part it would be replaced.)
\begin{definition}[Dominating Set]
Given an undirected Graph $G = (V,E)$ a Dominating Set is a subset $DS \subset V$ such that each vertex $v \in V$ is either included in the Dominating Set or adjacent to at least one vertex which is included in the Dominating Set. So a for a Dominating Set $DS$ the following statement is valid
\[\forall v \in V \setminus DS: \exists u \in DS, u \in N(v)\]
......@@ -19,7 +19,7 @@ The neighborhood of a single vertex $N(v)$ is defined above. Let the neighborhoo
Let $k \in \mathbb{N}$.
With help of this definition the k-neighborhood $N_k(v)$ of a single vertex $v \in V$ can recursively be defined as:
\[N_k(v) := N(N_{k-1}(v)) \setminus v\]
wheras $N_1(v) = N(v)$. So $N_k(v)$ is a set of all vertices which can be reached with at most $k$ steps starting from $v$.
whereas $N_1(v) = N(v)$. So $N_k(v)$ is a set of all vertices which can be reached with at most $k$ steps starting from $v$.
\end{definition}
\begin{definition}[k-hop Dominating Set]
......
\section{Discussion}\raggedbottom
As already mentioned and as \citet{myky} stated our model has some shortcomings and disregards aspects that influence an optimal venation pattern in real plants. We only focus on minimizing the number of cells that have to be transformed into vein cells, under the condition that the entire leaf can still be supplied with water and nutrients. Doing so the number of photosynthetic active cells and their outcome should be maximized. Our model completely disregards the vein hierarchy and among other things that environmental circumstances also influence the venation pattern \citep{bio_veinh}. The fact that plants try to minimize their total branch length and the transport distance for nutrients \citep{bio_netw} is also disregarded.
As our results revealed/ showed the neither the ILP implementation nor the ASP implementation are capable of generating solutions for our leaf graphs in a reasonable amount of time. The ILP implementation is incapable of finding an optimal solution in under 1000 seconds for the instance \textit{middle-leaf}, having only 62 nodes, with parameter $k = 1$. The ASP implementation on the other hand needed only 154 seconds to find an optimal solution. However both version find an appropriate upper bound in less than 1 second. The rest of the solving time is entirely used to close the gap from the lower bound.
The instance \textit{GNM\_ 500\_ 62375} on the contrary has 500 nodes but the ILP implementation nevertheless finds a solution in 154 seconds, whereas the ASP version could not find an optimal solution after 1000 seconds. As the results show the same difference in runtime on other rather spare and large random graphs the ILP version seems to perform better on random graphs in general. As the results for the random graphs indicated our ILP implementation might be a reasonable approach applied to other problems which can be modelled with the \textit{Minimum Connected (rooted) k-hop Dominating Set} depending on the structure of the input instances.
As well as \citet{myky} made the observation for the ASP implementation that an increasing parameter $k$ reduces the runtime significantly our tests showed the same effect using the ILP implementation. For the random graphs and parameter $k = 2$ or $k = 3$ every instance could be solved in less than 1 second. It should also be noted that for most of the instances in this case only a few or even none constraints needed to be added lazily. Optimal solutions consisted in this case for the most instances only of the single root node or contained also a few additional nodes.
These results can not unconditionally applied to other real world problems as their graphs can have specific structures that differ from random graphs.
Also on our leaf graphs an increasing $k$ implied a better runtime. However in the case of $k = 2$ and $k = 3$ the instances \textit{maple} and \textit{asymmetric} could not be solved under 1000 seconds. We can not simply arbitrarily increase the parameter $k$ in our model as vein cells must be in a range of 2-3 cells from mesophyl cells. \citep{nachschauen_auf_welcher_seite_und_aus_references_übernehmen}.
The runtime of the grid graphs also went down with increased $k$. For this graphs even with $k = 1$ an optimal solution could be found in under 1000 seconds. Admittedly all instances only had 64 nodes. As for the instance \textit{GRID\_ 8\_ 8} the time to find an optimal solution was 775 seconds it can be assumed that for larger instances the runtime exceeds 1000 seconds.
Using the \textit{intermediate node constraints} reduced the runtime the most. However in the most cases this constraints added unnecessary nodes to a solution which are not included without using this constraint. Nonetheless it could be considered to use this method to create approximative solutions. But for this purpose it would be desirable to formally prove the maximal amount of extra nodes in relation to an optimal solution. However our results show, at least exemplarily, that in most cases even without this additional constraint in rather short time appropriate upper bounds were established.
For the instance \textit{middle-leaf} for example the ILP implementation as well as the ASP implementation found an upper bound in less than 1 second that does not differ from an optimal solution. Thus an approximation for the upper bound does not seem tobe necessary. In fact a heuristic that generates an appropriate lower bound is much more desirable as closing the gap to the upper bound takes the major amount of time. Even for the rather large instance \textit{maple} an upper bound that does not differ from the optimal solution using the \textit{intermediate node constraint} is found after 29 seconds. At best this constraint could be used to evaluate how good upper bounds from the solving process are. But for this purpose an approximation factor would be necessary. For the \textit{asymmetric} an optimal solution could not be found under 1000 seconds even using this constraint. According to this there is still need for optimisation to create a satisfying implementation even if this constraint is used.
According to the current information using vertex separators seem to be the best method to induce connectivity on graphtheoretical problems. Alternative approaches from \citep{mtz} or \citep{klau} were not as succesfull for the corresponding problems in comparison to formulations that use vertex separators. Especially for the steiner tree problem \citet{fischetti_steiner_t} could achieve good results compared to other approaches. Also \citet{bomersbach} could achieve good results for the Connected Maximum Coverage Problem. In \citep{forrest} and \citep{fault_tolerant} this method was evaluated as promising.
For our problem and especially for the graphs that represent our leafs this method was not satisfying. The same applies to quadratical grid graphs. We assume the high number of unconnected integer solutions that are generated in the iteration process as beeing crucial. These solutions are most likely in some manner symmetrical such that an appropriate symmetry breaker could reduce the runtime drastically.
In general the ASP implementation performed better on our graphs representing the leafs. \citet{myky} mentioned different aspects in the conclusion of her thesis how the ASP implementation can be improved. As this implementation performed better than the ILP implementation so far it might be more reasonable to improve the ASP implementation rather than the ILP.
Another aspect that our tests revealed is that espacially on such instance where there is a rather large gap between the size of an optimal unconnected solution and an optimal connected solution the runtime is relatively high. This is probably related to the fact that in such cases many constraints were added lazily, which indicates that there is a high amount of unconnected integer solutions. For the instances where the gap was rather tight the runtime was much better. In the tests from \citep{myky} an ILP implementation for the unconnected Minimum $k$-hop Dominanting Set could create solutions much faster than the ASP implementation. This specific superiority is reflected here such that quickly valid solutions could be generated and it only neede to be verified if the solution is connected and otherwise only a few constraints needed to be added.
The density has also shown as a parameter which highly influences the runtime. On sparse graphs both the ILP implementation and the ASP implementation performed rather bad. For the random graphs instances with 250 and 500 nodes coould not be solved under 1000 seconds on rather sparse graphs with parameter $k = 1$. Our leaf graphs are all very sparse such that this effect plays a role as well. With increasing size the densitiy of our graphs even decreases.
Preadding vertex separator constraints had an measurable influence on the runtime. Unfortunately this effect alone could not improve the runtime in a manner that a satisfying implementation for our model could be created. Despite the fact that many constraints were preadded there were still a lot constraints that were added in the iteration process. It could make sense to identify the types of constraints that are still added in the solution process to prevent unnecessary iterations when they are added beforehand. This might lead to a better runtime.
Another approach to improve the implementation can be to add violated constraints not only after integer solutions are created but already when LP relaxations are calculated. This approach was used in \citep{forrest} and lead to sufficient LP bounds.
Eine weitere Möglichkeit, das Verfahren zu optimieren, wäre es, constraints nicht nur dann hinzuzufügen, wenn eine ganzzahlige Lösung ermittelt wurde, sondern schon dann, wenn eine LP relaxierung ermittelt wird. Dieser Ansatz wurde auch in \citep{forrest} verfolgt. Dabei konnten sehr gute Erfolge hinsichtlich der Lp Bound erzielt werden.
\pagebreak
\section{Implementation} \raggedbottom
\subsection{Softwarestack?}
\subsection{General?(better name)}
Our implementation is node based which means that we only use decison variables for nodes and not for edges.
So we assigne a variable $x_v \in \{0,1\}$ for every $v \in V$, whereas $x_v = 1 \Leftrightarrow v \in DS$.
(Maybe leaf classical dominating Set out?)
\subsection{Minumum Dominating Set}
As we try to minimize the number of vertices in the dominating set our ILP is given as:(Obvious/ useless phrase?) \\
\textit{objective target}:
\begin{equation} \label{obj}
min\{\sum_{v \in V}{x_v}\}
\end{equation}
\textit{subject to:}
\begin{equation} \label{base}
\sum_{w \in N(v)}{x_w} + x_v \geq 1, \forall v \in V
\end{equation}
The family of inequalities \eqref{base} is an ILP Version of the formal definition. It says that each vertex or at least one of its neighbors has to be included in the dominating set.
\subsection{Minimum $k$-hop Dominating Set}
The objective target for this problem is the same as \eqref{base}. But the family of inequalities \eqref{base} is not valid for this case. Instead another famility of inequalities is valid: \\
\begin{equation} \label{khop}
\sum_{w \in N_k(v)}{x_w} \geq x_v, \forall v \in V
\end{equation}
This family of inequalities is a serves to model the requierement that each vertex or at least one member of the k-neighborhood has to be included in the dominating set.
(If classical dominating set is left out, maybe mention case k = 1)
\subsection{Connectivity}
To enforce connectivity(using ILP)there are different approaches.
\subsubsection{Vertex separators}
On approach is to use so called vertex separators. In \citep{bomersbach} and \citep{fischetti_steiner_t} the authors used this approach to create ILP based algorithms to solve other graphtheoretical optimisation problems which require the solution to be connected. (In both papers it is mentioned that those separators define the connected subgraph polytope. Maybe mention as well?) In both publications this approach showed to be very successful as their algorithms outperformed previous state of the art algorithms. (Maybe to general and too strong?) So it seemed reasonable to us to use it as well.(Unneccessary phrase?)\\
\\
Let $v,w \in V$. A v-w-separator(Maybe textit?/ Maybe other notation) is a subset $S_{v,w} \subset V$ such that $G[V-S_{v,w}]$ has no path between $v$ and $w$. A minimal v-w-separator $S_{{v,w}_{min}}$ is a v-w-separator where no vertex can be removed. If a vertex is removed it no longer separates $v$ and $w$. (Maybe sounds "too dumb"? Look into explanation of other papers.) Let $S(v,w)$ (Use different notation. This is misleading) denote the family of all minimal v-w-separators. \\
The following family of inequalities taken from \citep{bomersbach} is used to enforce connectivity:
\begin{equation} \label{sep}
x_v + x_w \leq \sum_{u \in S_{v,w}}{x_u} + 1, \forall v, w \in V, v \neq w, \forall S_{v,w} \in S(v,w)
\end{equation}
This inequalities require that for each combination of two vertices $v$ and $w$ if both vertices included in the dominating set at least one vertex which separates them has also to be included. \\
In contrast to the problem from \citep{bomersbach} we have a predefined root node which must be part of the solution. So for our case it's sufficient to only use vertex separators that separate the connected component which includes the root node and the other connected components. In \citep{forrest} the authors introduced ILP-formulations for different problems motivated by forrest planning. One particular problem also had a predefined root node and demanded connectivity. They also used vertex separators to induce connectivity. For the particular problem they only used vertex separators that separate the root node from other components. As their tests showed and as our tests confirm this reduces the runtime.\\
\citep{bomersbach} states that as the number of vertex separators is potentially exponential this can create an exponential number of constraints(Previously always said inequalities. Might think about terminology again). Too many constraints would potentially overload the model(Maybe cite fischetti-steiner). This would increase the runtime as a lot of constranints had to be obeyed which may not be necessary to induce connectivity in the solution.
So in \citep{bomersbach}, \citep{fischetti_steiner_t} and \citep{forrest} they treated this constraints as lazy constraints which means that none of those constraints are included in the initial model. So iteratively integer solutions are resolved. If an integer solution is not connected minimal vertex separators which separate single components(In our case connected components and the root-component?) are identified via the following algorithms
\begin{algorithm}[H]
\SetAlgoLined
$DS^* := \{v | x_v = 1\}$ \\
$G' := G[DS]$\\
$C := $ set of all disjunct connected components\\
$c_{root} := $ connected component that contains $v_{root}$\\
\For{all components $c$ in $C \setminus \{c_{root}\}$} {
$v := $ any node from $c$\\
$s_1 := $ findMinVertexSeparator($G$, $DS^*$, $v \in c$, $v_{root}$, $c_{root}$)\\
$s_2 :=$ findMinVertexSeparator($G$, $DS^*$, $v_{root}$, $v \in c$))\\
\For{all $w_1 \in c$} {
add the following constraint to the model: $\sum_{s \in s_1}{x_s} \geq x_{w_1} + x_{v_{root}} - 1$\\
}
\For{all $w_2 \in c_{root}$} {
add the following constraint to the model: $\sum_{s \in s_2}{x_s} \geq x_{w_2} + x_{v} -1 $
}
}
\caption{Add violated constraints}
\end{algorithm}
\begin{algorithm}[H] \label{minSep}
\SetAlgoLined
$N(c_v) := $ neighbors of nodes of $c_w$ in $G$ (Maybe use the formal definition from methods?)\\
$G' := G$ with all edges between vertices in $c_v \cup N(c_v)$ removed\\
$R_w := $ vertices that can be reached from $w$ in $G'$\\
\Return $N(c_v) \cap R_w$
\caption{findMinVertexSeparator($G$, $DS^*$, $v \in c_v$, $w$, $c_v$)}
\end{algorithm}
The constraints \eqref{sep} containing this separators are then added to the model and the iteration process continues until a connected integer solution is found. Algorithm 2 is the same linear time algorithm as used in \citep{bomersbach} for to identify minimal vertex separators.\\
For the case that there is no optimal solution of size $1$ an additional constraint is added to tighten up the feasible region and to prevent unneccessary iterations.
\begin{equation} \label{neigh}
x_v \leq \sum_{w \in N(v)} x_w, \forall v \in V
\end{equation}
This constraint demands that for each vertex which is part of the dominating set at least one of its neighbors is also included. In \citep{bomersbach} and \citep{fischetti_steiner_t} this constraint is also part of the model. (Maybe mention that the "neighborhood" is always a minimum separator so this type of inequalities are valid)
\subsection{Minimum connected $k$-hop Dominating Set} \label{khopmodel}
A connected $k$-hop dominating set is a $k$-hop dominating set DS such that $G[DS]$ is connected.(Maybe refer to methods as this is redundant?). Its ILP-Formulation consists of the objective target \eqref{obj} and constraints \eqref{khop} and a collection of constraints to induce connectivity(In the future different types of potential constraints should be added).
\subsection{Minimum rooted connected $k$-hop Dominating Set}
Let $v_{root} \in V$ be the predefined root.The ILP-Model of this problem is the ILP-Model of \ref{khopmodel} enriched with following constraint.
\begin{equation} \label{root}
x_{v_{root}} \geq 1
\end{equation}
\ No newline at end of file
Now, we specify the implementation of the ILP-formulations from the Methods section. We implemented the ILP-formulations and Algorithms \ref{alg:addConst} and \ref{alg:minSep} using Python version 3.7.5. As branch and cut framework and MIP-solver we use Gurobi version 9.0.2. Gurobi offers a Python interface called \textit{gurobipy} which can be called from inside python scripts. This interface offers access to functions included in Gurobi.
Our implementation is embedded in a conda package. The package is called \textit{k\_ hop\_ dominating\_ set\_ gurobi}. The source of the package can be found on \url{https://gitlab.cs.uni-duesseldorf.de/albi/albi-students/bachelor-mario-surlemont/}.
The package itself can be build via
\begin{lstlisting}[language=bash, frame=none, basicstyle=\small]
conda build .
\end{lstlisting}
After heading into the directory.
To build the package \textit{conda-build} needs to be installed.
Afterwards the package can be installed via
\begin{lstlisting}[language=bash, frame=none]
conda install --use-local k_hop_dominating_set_gurobi
\end{lstlisting}
It holds the dependencies \textit{networkX}, \textit{matplotlib.pyplot} and \textit{gurobipy}.
The vertex separator constraints as well as the MTZ constraints can be chosen. The choice can be specified via the optional argument \textit{-mtz}, for the use of MTZ-constraints. By default the vertex separators are chosen. If required the additional constraints that have been presented in the method section can also be added to the model via the optional argument \textit{-imn| -rpl| -gaus| -pre} with rpl as abbreviation for the naive constraint to reduce the path length and gaus as abbreviation for the constraint involing the gaussian sum formula. The argument \textit{-pre} adds separators to the model before the solution process is started. When the intermediate node constraint is added via \textit{-imn} the generated solutions might not be optimal anymore.
As input networkx graphs stored as ``.graphml'' or ``.gml'' can be used. Also ``.lp'' files from \citep{myky} can be used. A full programm call is
\begin{lstlisting}[language=bash, frame=none, basicstyle=\small]
k_hop_dominating_set_gurobi (-mtz) (-inm) (-rpl) (-gaus) (-pre) graph.graphml k
\end{lstlisting}
If the vertex separators are chosen to induce connectivity a lazy approach is used. Gurobi offers a callback function which is called during the solution procedure when different events occur. The function offers a code that communicates the type of the occured event. When the callback code \textit{MIPSOLVE} is communicated an mixed ILP-solution was generated. That is a solution where those variables that must be integers are integers while those variables which do not need to be intergers can be arbitrarily chosen (with respect to the inequalities).
As we only have integer variables in our model the \textit{MIPSOLVE} code tells us that an integer solution $D^*$ was generated. In this case we check if the graph is connected. We use a function that is included in networkx to check if the graph $G[D^*]$ is connected. If not, algorithm \ref{alg:addConst} is used to add the corresponding constraints.
After a valid solution was found the inputgraph is plottet via matplotlib.plt. The members of the dominating set are displayed red while all the other vertices are displayed green.
The console output shows information about the solving process and the solution. Such as the current upper bound and lower bound.
\ No newline at end of file
\section{Introduction}\raggedbottom
Plants try to optimize their architecture to fulfil different objectives. One of it is to maximize the photosynthetic output. Another one is to minimize the cost to build the vascular system \citep{bio_netw}. To maximize the photosynthetic output plants optimize different parameters. As increasing one parameter can reduce another one, many parameters can not be optimized at the same time \citep{bio_netw} \citep{bio_nutrient}.
In this thesis we focus on one particular mechanism how plants can optimize their photosynthetic output.
To generate photosynthetical gains plants need sunlight, carbondioxid and water. (Photosynthese zitat. )
Water and nutrients are supplied via the vascular system. Xylem transports water to the leaves where the mesophyl cells produce sugars. These sugars are carried out to the whole plant by phloem, a tissue specialized on transporting sugars.
Xylem and phloem cells are not able to generate sugars, but they are mandatory to supply water to the mesophyl cells and to transport sugars. To be satisfied with the amount of water mesophyl cells have access to, they must not be more than 2-3 cells away from a xylem cell. In this range water can flow from the xylem cells through mesophyl cells that are not next to a xylem cell via diffusion. At the same time sugars can be transported away from the mesophyl cells and supplied to the phloem if there is a phloem cell in the range of 2-3 cells. (Zitat finden. )
To produce as much sugar as possible the plant can try to(driven by evolutionary processes) maximize the number of mesophyl cells by minimizing the number of vein cells. In this thesis we describe a method to reproduce an optimal venation pattern that minimizes the number of vein cells with respect to the constraint that all mesophyl cells need to be in a fixed range to vein cells. Leaf veins have a hierarchy. In general there is at least one thick major vein branch and several narrow minor branches. This hierarchy is completely disregarded in our problem formulation. Environmental circumstances also influence the venation pattern \citep{bio_veinh}. These influences on the venation are also completely disregarded in our model.
The input instance is given by an undirected graph $G = (V,E)$ that represents a leaf. The set of vertices $V$ represents the leaf cells while the set of edges $E$ represents the connections between the leaf cells in the form of plasmodesmata. To find an optimal pattern we use a special variant of the dominating set problem. For this problem we present an ILP-formulation and an implementation in a branch and cut framenwork.
The dominating set problem and several variants are NP-hard \citep{ilp_np}. For our specific case we demand connectivity between the members of the set. This connectivity in ILP-formulations is subject of different prublications as it is not trivial.
\citet{myky} presented in her bachelors thesis an alternative to ILPs. She implemented an algorithm for our problem using Answer Set Programming (ASP). For larger input instances the ASP-version did not create optimal solutions in a reasonable amount of time. \citet{myky} compared for the case where the dominating set does not need to be connected the runtime from an ILP-Version to the runtime from her ASP-version. Her tests revealed that for this particular problem the ILP-version performed significantly better.
Goal of this thesis is to formulate an ILP and to evaluate wether if this performs better on our input graphs. We compared the ASP-version with an ILP-formulation that was created in this thesis. Contrary to the presumption that the ILP-version could generate solutions faster, on our input instances the ASP-version was significantly faster.
However the ILP-version outperformed the ASP-version on random graphs. The different characteristics and the runtime for the graphs can be taken from the results section. In the discussion section we discuss which characteristics are responsible for the differences in the runtime and what effect initiates them.
In Section 2, the Preleminaries, we will give a short introduction in ILP. Additionally important defintions are stated. After that in the following Section 3 we define the methods to find an optimal venation pattern. Section 4 demonstrates the implementation. At last in Section 4 and Section 5 we present the results and followed by a discussion on the effectiveness and limitations of the ILP-solution and which characteristics graphs hold to perform either better with the ILP-version or with the ASP-version.
\pagebreak
\section{Methods} \raggedbottom
We represent a leaf as a undirected graph $G = (V,E)$. Each vertex $v$ represents a leaf cell whereas a root $v_{root}$ is predefined. Leaf cells are connected to its neighbors cells via plasmodesmata. Those connections are represented by the edges $E$. We then look for a minimum set of nodes such that still the whole leaf can be supplied with water and the nutrients can be evacuated. For this purpose these vein cells need to be connected and the root cell needs to be part of the solution. (Vielleicht an dieser Stelle schon auf DS referenzieren)\\
We represent a plant's leaf as an undirected graph $G = (V,E)$. Each vertex $v$ represents a leaf cell whereas a root $v_{root}$ is predefined. Leaf cells are connected to its neighboring cells via plasmodesmata. Plasmodesmata are microscopic channels that link plant cells, enabling transport of nutrients and water amongst of other things . Those connections are represented by the edges $E$. We then look for a minimum set of nodes such that the whole leaf can still be supplied with water and the nutrients can be collected. For this purpose these vein cells need as well as the root need to be connected. Those cells form our solution for a rooted connected $k$-hop dominating set $D$\\
(Irgendwie unterbringen, dass die non-vein-Zellen nicht direkt mit den vein-Zellen benachbart sein müssen.)
\\
We use an node based ILP-Formulation to solve this special variant of the dominating set. We start by introducing a formulation for the general $k$-hop dominating set. As the objective function for our special variant remains the same, we then stepwise add constraints until we can present an ILP-formulation for the rooted connected $k$-hop dominating set.\\
As our implementation is node based we ommit decision variables for edges, and instead only assign a variable $x_v \in \{0,1\}$ for every $v \in V$, whereas $x_v = 1 \Leftrightarrow v \in DS$.
We use a node based ILP-Formulation to solve this special variant of the dominating set. We start by introducing a formulation for the general $k$-hop dominating set. As the objective function for our special variant remains the same, we then add constraints in a stepwise manner until we can present an ILP-formulation for the rooted connected $k$-hop dominating set.\\
As our implementation is node based we omit decision variables for edges, and instead only assign a variable $x_v \in \{0,1\}$ for every $v \in V$, with the interpretation $x_v = 1 \Leftrightarrow v \in DS$.
\subsection{Minimum Dominating Set}
As we try to minimize the number of vertices in the dominating set our ILP is given as:\\
\textit{objective target}:
\begin{equation} \label{obj}
min\{\sum_{v \in V}{x_v}\}
\min \lbrace \sum_{v \in V}{x_v} \rbrace
\end{equation}
\textit{subject to:}
\begin{equation} \label{base}
......@@ -24,15 +24,23 @@ The objective target for this problem is the same as \eqref{obj}. But the family
This family of inequalities serves to model the requirement that each vertex or at least one member of the k-neighborhood has to be included in the dominating set. For the case $k = 1$ this family is the same as \eqref{base}.
\subsection{Connectivity}
To enforce connectivity(using ILP)there are different approaches. As this is not trivial there have been many publications \citep{bomersbach}, \citep{fischetti_steiner_t}, \citep{fault_tolerant}, \citep{forrest}, \citep{on_imposing_con}, \citep{mtz} concerning this issue in the past years.
To enforce connectivity (using ILP) there are different approaches. As this is not trivial there have been many publications \citep{bomersbach}, \citep{fischetti_steiner_t}, \citep{fault_tolerant}, \citep{forrest}, \citep{on_imposing_con}, \citep{mtz} concerning this issue in the past years.
\subsubsection{Vertex separators}
On approach is to use so called vertex separators. In \citep{bomersbach} and \citep{fischetti_steiner_t} the authors used this approach to create ILP based algorithms to solve other graph theoretical optimization problems which require the solution to be connected. \citep{bomersbach} presented an ILP-formulation to solve the connected maximum coverage problem and \citep{fischetti_steiner_t} proposed ILP-formulations for different variants of the steiner tree problem.
One approach is to use so called vertex separators. In \citep{bomersbach} and \citep{fischetti_steiner_t} the authors used this approach to create ILP based algorithms to solve other graph theoretical optimization problems which require the solution to be connected. \citet{bomersbach} presented an ILP-formulation to solve the connected maximum coverage problem and \citet{fischetti_steiner_t} proposed ILP-formulations for different variants of the steiner tree problem.
(that was solved in a branch and cut framework?).
As \citep{bomersbach} refers to \citep{fischetti_steiner_t} in terms of the connectivity constraints, both ILP-formulations use the same constraints to enforce connectivity.
The tests of \citep{bomersbach} compared the runtime of this implementation to previous proposed exact algorithms and to greedy approaches for the connected maximum coverage problem. In all test cases this implementation was significantly faster than all other exact algorithms. While in some cases the greedy algorithm was slightly faster, the proposed algorithm was more accurate.
In \citep{bomersbach}, the authors compared the runtime of this implementation to previous proposed exact algorithms and to greedy approaches for the connected maximum coverage problem. In all test cases this implementation was significantly faster than all other exact algorithms. While in some cases the greedy algorithm was slightly faster, the proposed algorithm was more accurate.
The algorithm from \citep{fischetti_steiner_t} significantly improved the runtime of an exact solver for all the different steiner tree problem variants and their proposed implementation won most of the different categories of the 11th DIMACS challenge on steiner trees.
\\
Let $v,w \in V$. A $v$-$w$-separator is a subset $S_{v,w} \subset V$ such that $G[V-S_{v,w}]$ has no path between $v$ and $w$. A minimal $v$-$w$-separator $S_{{v,w}_{min}}$ is a $v$-$w$-separator where no vertex can be removed. If any vertex $y$ is removed $S_{{v,w}_{min}} \setminus \{y\}$ it no longer separates $v$ and $w$. Let $S(v,w)$ (Use different notation. This is misleading) denote the family of all minimal $v$-$w$-separators. \\
\begin{figure}
\centering
\includegraphics[width=10cm]{bilder/vertex_separator_illustration.eps}
\caption{Illustration of vertex separators. In all three pictures the set of green nodes separates the blue and the red node. In the middle and on the right picture minimal separators are illustrated. If one of the green nodes is turned into a black node, the green set would not separate the blue and the red node anymore. }
\label{mtz}
\end{figure}
Let $v,w \in V$. A $v$-$w$-separator is a subset $S_{v,w} \subset V$ such that $G[V-S_{v,w}]$ has no path between $v$ and $w$. A minimal $v$-$w$-separator $S_{{v,w}_{min}}$ is a $v$-$w$-separator where no vertex can be removed. That is, $S_{{v,w}_{min}} \setminus \{y\}$ is not a separator for $v$ and $w$. Let $S(v,w)$ (Use different notation. This is misleading) denote the family of all minimal $v$-$w$-separators. \\
In \citep{bomersbach} and \cite{fischetti_steiner_t} the following family of inequalities is used to enforce connectivity:
......@@ -47,10 +55,10 @@ x_v + x_w \leq \sum_{u \in S_{v,w}}{x_u} + 1, \forall v, w \in V, v \neq w, \for
\end{equation}
for minimum vertex separators that include the root node.
The number of all minimum vertex seperator constraints is potentially exponential \citep{bomersbach}. Therefore in \citep{bomersbach}, \citep{fischetti_steiner_t} and \citep{forest} they treated these constraints as lazy constraints, which means in particular that none of those constraints are included in the initial model. Instead iteratively integer solutions are resolved \citep{bomersbach}, \citep{fischetti_steiner_t}. If such a solution is not connected, in \citep{bomersbach} and \citep{fischetti_steiner_t} minimal vertex separators that separate single components are identified via a linear time algorithm, while in \citep{forrest} a classical max-flow min-cut theorem is used to identify violated constraints.\\
The number of all minimum vertex seperator constraints is potentially exponential \citep{bomersbach}. Therefore in \citep{bomersbach}, \citep{fischetti_steiner_t} and \citep{forrest} they treated these constraints as lazy constraints, which means in particular that none of those constraints are included in the initial model. Instead iteratively integer solutions are resolved \citep{bomersbach}, \citep{fischetti_steiner_t}. If such a solution is not connected, in \citep{bomersbach} and \citep{fischetti_steiner_t} minimal vertex separators that separate single components are identified via a linear time algorithm, while in \citep{forrest} a classical max-flow min-cut theorem is used to identify violated constraints.\\
Our algorithm to identify and add violated constraints is analogous the one from \citep{bomersbach} with the exception that we only search for violated constraints that include the root node.
\begin{algorithm}[H]
\begin{algorithm}[H] \label{alg:addConst}
\SetAlgoLined
$DS^* := \{v | x_v = 1\}$ \\
$G' := G[DS]$\\
......@@ -60,7 +68,7 @@ Our algorithm to identify and add violated constraints is analogous the one from
\For{all components $c$ in $C \setminus \{c_{root}\}$} {
$v := $ any node from $c$\\
$s_1 := $ findMinVertexSeparator($G$, $DS^*$, $v \in c$, $v_{root}$, $c_{root}$)\\
$s_2 :=$ findMinVertexSeparator($G$, $DS^*$, $v_{root}$, $v \in c$))\\
$s_2 :=$ findMinVertexSeparator($G$, $DS^*$, $v_{root}$, $v \in c$, $c$))\\
\For{all $w_1 \in c$} {
add the following constraint to the model: $\sum_{s \in s_1}{x_s} \geq x_{w_1} + x_{v_{root}} - 1$\\
}
......@@ -82,7 +90,7 @@ This algorithm is executed each time an integer solution is resolved (using a br
\caption{findMinVertexSeparator($G$, $DS^*$, $v \in c_v$, $w$, $c_v$)}
\end{algorithm}
The algorithm above detects a minimal vertex separator that seperates the node $w$ and the connected component $c_v$. It is taken from \citep{bomersbach} although Bomersbach et al. took it initially from \citep{fischetti_steiner_t}. With this method the minimal vertex separator is found that is closest to the component $c_v$. In picture \ref{pic:min_sep} one can see an illustration of the process. Suppose the red marked nodes are an unconnected solution $D^*$. The set of blue marked nodes is the minimal separator that is closest to the connected component on the upper graph while the set of green marked nodes is the minimal separator that is closest to the component containing the root. On the picture in the middle and the right you can see the step \ref{remEdges} of the algorithm \ref{alg:minSep}. As one can see, after removing all edges between the components and its neighborhood the blue marked nodes on the middle picture and the green marked nodes on the right picture are still reachable from the other component. Therefore the algorithm returns this selection of nodes as minimal vertex separator.
The algorithm above detects a minimal vertex separator that seperates the node $w$ and the connected component $c_v$. It is taken from \citep{bomersbach} although \citet{bomersbach} took it initially from \citep{fischetti_steiner_t}. With this method the minimal vertex separator is found that is closest to the component $c_v$. In picture \ref{pic:min_sep} one can see an illustration of the process. Suppose the red marked nodes are an unconnected solution $D^*$. The set of blue marked nodes is the minimal separator that is closest to the connected component on the upper graph while the set of green marked nodes is the minimal separator that is closest to the component containing the root. On the picture in the middle and the right you can see the step \ref{remEdges} of the algorithm \ref{alg:minSep}. As one can see, after removing all edges between the components and its neighborhood the blue marked nodes on the middle picture and the green marked nodes on the right picture are still reachable from the other component. Therefore the algorithm returns this selection of nodes as minimal vertex separator.
\begin{figure}
\centering
......@@ -196,5 +204,5 @@ As this constraint did not reduced the runtime wie tried to refine it. There are
This circumstane lead to the following constraint, that makes use of the gausian summ formula. The idea is still to limit the distance between the root node $v_{root}$ and all the members of $D$. In this advanced formulation we limit the sum of the distances to $\sum_{i_1}^|D*|{i}$. This constraint cuts of unconnected solutions that are valid using only the previous constraint \eqref{rpl}. But as our tests revealed this constraint did not generate a performance boost but even epanded the runtime(As it probably adds too much complexity to the model).
(Maybe also mention that this constraint in isolation allows solutions which are forbidden using the previous one)
\subsubsection{preventively adding separators}
We use the lazy approach to prevent that too many constraints are added that are not mandatory to generate suffiecient solutions. In despite of this we evaluated if adding a certain amount/ some particular separator constraints could reduce the runtime. It could have been that are more appropriate LP bound is generated using this approach and unnecessary iterations could have been prevented.
\subsubsection{Preventively adding separators}
We use the lazy approach to prevent that too many constraints are added that are not mandatory to generate sufficient solutions. In despite of this we evaluated if adding particular separator constraints could reduce the runtime. It can be that a more appropriate LP bound is generated using this approach and unnecessary iterations can be prevented.
......@@ -3,32 +3,31 @@
Linear programming is a technique to minimize linear functions.
The following definition is based on the book \citep{fischetti2019introduction}\\
A linear programm (LP) problem consists of an objective function that is minimized with respect to a set of linear inequalities. \\
A linear programm (LP) problem consists of an linear objective function that is minimized with respect to a set of linear inequalities. \\
\\
Linear programms can be expressed as
\[min\{c^Tx : Ax \geq b, x \geq 0\}\]
where $b \in \mathbb{R}^n$ and $c \in \mathbb{R}^n$ are constant vectors. The matrix $A \in \mathbb{R}^{m \times n}$ contains the coefficients of the $m$ inequalities. The objective function $c^Tx \in R$ is to be minimized. The vector inequality $Ax \geq b$ has to be valid for a solution.
The vector $x \in \mathbb{R}^n$ describes possible solutions. If $x \in \mathbb{R}^n$ obeys all inequalities it is called a feasible solution. A solution $x^*$ is optimal if it respects all inequalities and is minimal.
\[\min\{c^Tx : Ax \geq b, x \geq 0\}\]
where $b \in \mathbb{R}^m$ and $c \in \mathbb{R}^n$ are constant vectors. The matrix $A \in \mathbb{R}^{m \times n}$ contains the coefficients of the $m$ inequalities. We minimize the objective function $c^Tx \in \mathbb{R}$. The vector inequality $Ax \geq b$ has to be satisfied for a valid solution.
The vector $x \in \mathbb{R}^n$ describes possible solutions. If $x \in \mathbb{R}^n$ satisfies all inequalities it is called a feasible solution. A solution $x^*$ is optimal if it respects all inequalities and is minimal.
\\
\\
Integer linear programms (ILPs) are linear programms with the additional restriction that all variables have to be integers: $x \in \mathbb{Z}^n$.
The decision variant of an ILP is NP-complete.\citep{ilp_np}
The decision variant of an ILP is NP-complete \citep{ilp_np}.
\\
Each line $j$ of $Ax \geq b$ can be expressed as the sum $\sum_{i=1}^{n}{a_ix_i} \geq b_j$. The objective function can be expressed as $\sum_{i=1}^n{c_ix_i}$. In this thesis we use this notation as we perceive it as more readable.
\\
Each line $j$ of $Ax \geq b$ can be expressed as the sum $\sum_{i=1}^{n}{a_{ij}x_i} \geq b_j$. The objective function can be expressed as $\sum_{i=1}^n{c_ix_i}$. In this thesis we use this notation as we perceive it as more readable.
Combinatorical optimisation problems can be modelled with ILPs. Every variable $x_i \in \{0,1\}$ denotes a possible decision to include item $i \in \{1,...,n\}$ in the solution.
\subsection{Definitions}
\begin{definition}[Neighborhood]
Given an undirected Graph $G = (V,E)$. Let $N(v)$ denote the neighborhood of a vertex $v$. $N(v)$ can formally be described as follows: \[w \in N(v) \Leftrightarrow \exists (v,w) \in E\]
Given an undirected graph $G = (V,E)$. Let $N(v)$ denote the neighborhood of a vertex $v$. $N(v)$ can formally be described as follows: \[w \in N(v) \Leftrightarrow \exists (v,w) \in E\]
\end{definition}
(Maybe leaf base case out as the s.t. part is different from k-hop version. So after introducing in the implementation part it would be replaced.)
\begin{definition}[Dominating Set]
Given an undirected Graph $G = (V,E)$ a Dominating Set is a subset $DS \subset V$ such that each vertex $v \in V$ is either included in the Dominating Set or adjacent to at least one vertex which is included in the Dominating Set. So a for a Dominating Set $DS$ the following statement is valid
\[\forall v \in V \setminus DS: \exists u \in DS, u \in N(v)\]
Given an undirected Graph $G = (V,E)$ a dominating set is a subset $D \subset V$ such that each vertex $v \in V$ is either included in the dominating set or adjacent to at least one vertex which is included in the dominating set. For a dominating set $D$ the following statement is valid
\[\forall v \in V \setminus D: \exists u \in D, u \in N(v)\]
\end{definition}
\begin{definition}[k-Neighborhood]
The neighborhood of a single vertex $N(v)$ is defined above. Let the neighborhood of a set of vertices $W \subset V$ be defined as follows:
The neighborhood of a single vertex $N(v)$ is defined above. Let the neighborhood of a set of vertices $W \subset V$ be defined as
\[N(W) := \bigcup_{u \in W} N(u)\]
Let $k \in \mathbb{N}$.
With help of this definition the k-neighborhood $N_k(v)$ of a single vertex $v \in V$ can recursively be defined as:
......@@ -37,17 +36,20 @@ whereas $N_1(v) = N(v)$. So $N_k(v)$ is a set of all vertices which can be reach
\end{definition}
\begin{definition}[k-hop Dominating Set]
A $k$-hop Dominating Set is a subset $DS \subset V$ such that for each vertex $v \in V \setminus DS$ there exists a path of length $l \leq k$ between $v$ and at least one vertex $d \in DS$. So $DS$ is a $k$-hop dominating set if it fulfills the following requirement:
\[\forall v \in V \setminus DS: \exists u \in DS, u \in N_k(v)\]
A $k$-hop dominating set is a subset $D \subset V$ such that for each vertex $v \in V \setminus D$ there exists a path of length $l \leq k$ between $v$ and at least one vertex $d \in D$. Thus $D$ is a $k$-hop dominating set if it satisfies the following requirement:
\[\forall v \in V \setminus D: \exists u \in D, u \in N_k(v)\]
This means that each vertex is either part of $D$ or in $N_k(w)$ for any $w \in D$.
\end{definition}
\begin{definition}[connected k-hop Dominating Set]
A $k$-hop Dominating Set $DS$ is a connected $k$-hop Dominating Set if the induced subgraph $G[DS]$ is connected.
A $k$-hop dominating set $D$ is a connected $k$-hop Dominating Set if the induced subgraph $G[D]$ is connected.
\end{definition}
\begin{definition}[rooted connected k-hop Dominating Set]
Let $v_{root} \in V$ be a predefined vertex.
A rooted connected $k$-hop Dominating Set DS is as connected $k$-hop Dominating Set which also includes $v_{root}$.
Let $v \in V$ be the \emph{root}.
A rooted connected $k$-hop dominating set $D$ is as connected $k$-hop dominating set which also includes $v$.
\end{definition}
(Add a definition for what "connected" means)
\pagebreak
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment