# Week 4
## Python
### implementation
* Refactor code.
- Fix bug with separators. ✔
- Add option to cli to select which type of inequalities to be included. ✔
* Implement different type of constraints.
- _Solving the Maximum-Weight Connected Subgraph Problem to Optimality_ ❌ (moved to backlog)
- _Solving the Connected Dominating Set Problem and Power Dominating Set Problem by Integer Programming_ ✔
- Only one type of constraints which performed best in this paper. Performs bad on our graphs.
### runtime
* Do tests according to test concept and store data in some sort of strucured format(CSV/ XML/ JSON) which could be importet in python or R. ✔
## thesis
* Begin paragraph about results and add preliminary results and observations. ❌ (moved to week 5)
- Create a concept how to test and which test results you want to include. (✔) (a preliminary concept)
## literature
* Read _Imposing Connectivity Constraints in Forest Planning Models_.
- Check for different constraints which could strengthen the formulation. ❌ (moved to week 5. But with focus on symmetry breaking)
* Read _Solving the Connected Dominating Set Problem and Power Dominating Set Problem by Integer Programming_ ✔
- Different constraints to induce connectivity are compared. Try to implement some of them to compare them and see which one is best. You may see if this is related to a charateristic of your graphs. As the literature may states that for some type of graphs some approaches are better.
* Check _On imposing connectivity constraints in integer programs_ again to see if there is some other literature which you are missing. ❌ (moved to week 5)
# Week 5
## Python
### implementation
* Try to implement a symmetry breaker to check if this affects especially the runtime which is needed to close the gap and exclude all unconnected solutions with at most as many nodes as an optimal connected.
- May implement My Ky's, even if it is not optimal/ 100% correct. Just to observe if there is a connection between symmetrical unconnected solutions and the runtime and how strong this connection is. (Possible indicators: runtime, number of lazily added constraints) ❌
- Unfortunately her symmetry breaker made use of edges and our model consists only of node variables.
- Until now there has no symmetry breaker come to my mind and I haven't recognised one in a paper yet.
### runtime
* Create random graphs and try to compare runtime between ILP and ASP.
- Measure some characteristics of the graphs such as density, |V|, |E|, maximum degree, minimum degree, average degree, (maybe standard derivation of degree or median degree?) ✔
- The graphs have been tested on random graphs with different size (|V| = 10, 20, 50, 100, 250, 500) with each size having 10 density(0.1, 0.2, ..., 0.9) levels.
- The results of this implementation were comparable to the one from _An Integer Programming Approach for Fault-Tolerant Connected Dominating Sets*_.
- On random graphs the implementation can handle much more vertices than on our graphs.
- A fact that is *very* interesting is that on gridlike graphs the implementation performs way worse. Even when those graphs have similar density and much less vertices.
- A major problem seems to be a low average degree and a low maximum degree of the nodes!
- It seems to be as I assumed that when "high value" vertices are missing there are to many alternatives which have to be excluded in the iteration process.
- I should create tables with the differences in characteristics of those gridlike graphs and random graphs.
- On "thin" random graphs (width << length) the algorithm performs much better as there are not so many alterantive I assume.
- A comparison between ILP and ASP hasn't been done yet but is important to see, if this implemantation is at least competetive on random graphs! ❌
- Measure and note gap between |D| of a minimal connected and a minimal unconnected solution. Measure for each |D| number of unconnected solutions which were found(ILP-only) and how many constraints where added in sum.
- - Constraints ✔
- - Number of unconnected solutions is equal to lazily added constraints ✔
- - It would still be interesting to measure the number of solutions which exist for each size. Completely ommiting any connectivity. ❌
- Measure the time which was needed to find the first (nearly) optimal solution (strong upper bound) and the time needed to close the gap. ✔
* Make some more detailed tests and check for the following connections:
- To what degree does the number of unconnected k-hop solutions which have at most as many nodes as an optimal connected solution affect the runtime? (Calculate how many of them exist and add this data as a column to a table of test results)
- Is there a clear corellation between the number of constraints which were added and the runtime for (possible unconnected) solutions?
- Check for graphs where the gap between an optimal connected solution and the smallest unconnected solution is rather big if this approach(our implementation) might be unapplyable. Such that there is too much time wasted to exclude all possible unconnected solutions and increase the lower bound.
* Implement a heuristic to start with a stronger lower bound. ✔
- Given a symmetrical leaf where the root is centered (at least on the x-axis). There are at least ![width-2*k+height-k-\frac{width}{2}-k](https://render.githubusercontent.com/render/math?math=width-2*k%2Bheight-k-%5Cfrac%7Bwidth%7D%7B2%7D-k) nodes needed to provide a connected dominating set.
- To reach both rims in the width and the rim on the top.
- Unfortunately this did not improve the runtime at least for the bigger-leaf example with k=2. It drastically increased the runtime.
- So at first I would not follow this idea and try to find another technique to generate a better lower bound.
## thesis
* Begin paragraph about results and add preliminary results and observations. ❌
- Refine the concept how to test and which test results you want to include. (✔)
- The concept was refined as additonal parameters were added which could underpin (or disprove) the assumption that our graphs and gridlike graphs in general are hard too solve with this approach.
- The paragraph as a latex file wasn't started yet. ❌
- This has the HIGHEST priority for week 6 as writing is (for me) the least pleasurefull task.
## literature
* Read _Imposing Connectivity Constraints in Forest Planning Models_. ✔
- Check for different constraints which could strengthen the formulation. (with focus on symmetry breaking)
- They did not mention a symmetry breaker *but* they mentioned some other types of inequalities which could strengthen the formulation according to the connectivity specification.
- The two most promising seem to be
- Only adding separators involving the root node but not between connected components
- The rooted ring inequalities
- A thing which was interesting that for their specific problems (which were not really close to MCDS and had only connectivity in common) they achived a much stronger LP bound.
- They added cuts before an ILP solution was found and added cuts for LP solutions also.
- They used different subproblems which they solved iteratively and used the previous results as a heuristic for the next iteration step.
* Read _An Efficient Branch and Cut Algorithm to Find Frequently Mutated Subnetworks in Cancer_ again with focus on symmetry breaking. ✔
- No symmetry breakers were mentioned.
* Read through _An Integer Programming Approach for Fault-Tolerant Connected Dominating Sets*_ again and check for symmetry breaking or other constraints to tighten up the space of solutions. ✔
- I could not find anything about symmetry breaking or additional inequalities for the case k=d=1 (Which is standard MCDS). But the table of results was interesting because they also tested their implemenation for the case k=d=1 which then is equal to our ILP-formulation. Their results were not bad but unfortunately I could not find anything more detailed description of their test graphs. Only number of nodes and density is shown. But those two properties are not sufficient as my personal test on random graphs revealed.
- An interesting observation, according to their result tables is that their largest test graph(with lowest densitiy) which has 200 nodes has an optimal solution that consists of "only" 26 nodes whereas an optimal solution for our bigger-leaf test graph in the case k=1 consists of 24 nodes while the graph only has 70 nodes. Both graphs have approximately the same density. So their graph must have some nodes which have a higher value/degree such that adding those nodes generates a higher profit than adding others. So there is no equal alternative to adding this nodes. This reduces the number of iterations and (as one can see in the table of results) the number of lazily added constraints. As a consequence I assume that is in an important factor which reduces the runtime.
# Week 6
## Python
### Implementation
* Implement the mentioned constraints from _Imposing Connectivity Constraints in Forest Planning Models_ (Thursday) (✔)
- From now on only separators between the root node and other components are added to the model. This improves the runtime. ✔
- The ring inequalites haven't been implemented yet. ❌
- I would prefer to focus on the writing and further testing prior to improve the formulation. If there is enough time left I will try to further improve the runtime.
* May implement that cuts are added even for fractional solutions to strengthen the LP bound. (thursday) ❌ (moved to bbacklog)
### runtime
* Test the ASP version on the graphs from week 5 to compare them. (saturday) ✔
- The ASP version performs quite bad on many random graphs where the ILP version performs quite good. One key factor that I could determine is that on those random graphs were the ILP version was good the difference in the size of an unconnected solution and a connected is rather low. So the ILP version seems to handle those instances were there is no big difference good.
- The ASP version performed in general better on grid graphs. This is what I expected as the ASP version was better on the leaf graphs as well. My assumption still is that there are too many "alternative" unconnected solutions where it is much cheaper to replace nodes from an previous unconnected solution and to generate a new unconnected solution than to add those vertices which separate them. As for those instances where the ILP needs a lot of time to solve the number of lazily added constraints is very high and the gap decreases(my test results) very slow both of these facts seem to underline my assumption.
* Test the (new) implementation on the usual graphs and them from week 5. (saturday) (❌)
- Only tested on a few graphs as the testing costs much time. But the decrease in runtime was significant.
* Create clean tables and CSV files. (sunday) (❌)
- As I tested many graphs I have to further select which graphs I really want to include and I might have to do new tests if I change the implementation.
## thesis
* Begin the paragraph of results. (wednesday) (✔)
- I wrote those bullet points that I want to include in a text file. But I haven't created an tex file yet.
* Refactor the paragraph for Implementation and methods. (wednesday) ✔
# Week 7
## thesis
* According to Elines feedback.
- Move definitions from methods to a preliminary section and move the intro into ILP's to this section. ✔
- Move the stuff from the implementation section to the methods section and mix it up a bit. (✔)
- (The implementation section was removed for now and its content was moved to methods. Most of the definitions from the preliminaries section should be moved to method section and mixed up with the content)
- Rework the part with the vertex separators and be more precise and specific. ✔
- Add missing stuff. (❌)
- Different type of constraints to enforce connectivity. (✔)
- I wrote a german draft to get an idea how to do it.
- Different constraints that I tried to strengthen the ILP formulation and preventively adding separator constraints. ❌
## python
### runtime
* Check how large is the difference between the size of a minimum dominanting set and a rooted connected minimum dominating set for all test instances.
- Done for leafs. ✔
# Week 8
## thesis
* Mix the stuff from the methods section and the definitions up a bit.
* Add missing stuff. (Highest priority! Rather have awfull stuff that can be refinded than having the feeling of getting stucked!)
- Different type of constraints to enforce connectivity.
- Different constraints that I tried to strengthen the ILP formulation and preventively adding separator constraints.
- Introduction
- Table of results
- Discussion
- Implementation
## python
### runtime
* Check how large is the difference between the size of a minimum dominanting set and a rooted connected minimum dominating set for all test instances.
- GNM 250 graphs.
- GNM 100 graphs.
## python
### runtime
* For the leaf instances, the grid graphs and some of the random graphs:
- measure the size of a minimal unconnected solution and how fast the added separators close the gap.
# Backlog
## Python
### Implementation
* Remove lp_to_graph from package and instead use some other format to store graphs and use them as input. ✔
* Implement different type of constraints.
- _Solving the Maximum-Weight Connected Subgraph Problem to Optimality_
* Carefully check the literature i.e. _An Integer Programming Approach for Fault-Tolerant Connected Dominating Sets*_, _An Efficient Branch and Cut Algorithm to Find Frequently Mutated Subnetworks in Cancer_, _Thinning out Steiner trees: a node-based model for uniform edge costs_ to check if they did anything differently according to the sepearating constraints.
* Check if something from the algorithmic framework of _Thinning out Steiner trees: a node-based model for uniform edge costs_ can be applied to our case. (Local Branching, Heuristic, etc.)
* Implement a heuristic to provide a better upper bound.
- The intermediate root constraint could be useful to do so.
* Maybe *change the cost function for the nodes*.
- At the moment every node has a cost of 1.
- It could work to give each node the cost of shortest path between a node and the root.
- With this technique we would encourage the solver to prefer nodes that are close to the root. With this we would avoid that far distant nodes would be added and connected subcomponents which are far away from the root are generated.
- At least this could lead to a heuristic.
## thesis
* Unify style of the entries of the bibliography.
### packaging
* Add unit tests to the package
* Add dependencies to the recipe
* Find a place to store the code if the package will be published(Maybe a standard github repo)