diff --git a/recommender.tex b/recommender.tex
index 10bc2221d2cfd78ab037067940cc39c3f00c921b..8266b371206b73766850f0df9edfd9c1ebbe484b 100644
--- a/recommender.tex
+++ b/recommender.tex
@@ -26,14 +26,14 @@ The core idea of \textit{matrix factorization} is to supplement the not complete
 \subsubsection{Basic Matrix-Factorization}
 The first and easiest way to solve \textit{matrix-factorization} is to connect the \textit{feature vectors} of the \textit{users} and the \textit{items} using the \textit{inner product}. The result is the \textit{user-item interaction}. In addition, the \textit{error} should be as small as possible. Therefore, $\min_{p_u, q_i}{\sum_{(u,i) \in \mathcal{B}} (r_{ui} - \hat{r}_{ui})^{2}}$  is defined as an associated \textit{minimization problem}.
 
-\subsubsection{Regulated Matrix-Factorization}
+\subsubsection{Regulated Matrix-Factorization}\label{subsec:rmf}
 This problem extends the \textit{basic matrix-factorization} by a \textit{regulation factor} $\lambda$ in the corresponding \textit{minimization problem}. Since $\mathcal{R}$ is thinly occupied, the effect of \textit{overfitting} may occur due to learning from the few known values. The problem with \textit{overfitting} is that the generated \textit{ratings} are too tight. To counteract this, the magnitudes of the previous vectors is taken into account. High magnitudes are punished by a factor $\lambda(\lVert q_i \rVert^2 + \lVert p_u \lVert^2)$ in the \textit{minimization problem}. Overall, the \textit{minimization problem} $\min_{p_u, q_i}{\sum_{(u,i) \in \mathcal{B}} (r_{ui} - \hat{r}_{ui})^{2}} + \lambda(\lVert q_i \lVert^2 + \lVert p_u \lVert^2)$ is to be solved.
 The idea is that especially large entries in $q_i$ or $p_u$ cause $\lVert q_i \rVert, \lVert p_u \rVert$ to become larger. Accordingly, $\lVert q_i \rVert, \lVert p_u \rVert$ increases the larger its entries become. This value is then additionally punished by squaring it. Small values are rewarded and large values are penalized. Additionally the influence of this value can be regulated by $\lambda$.
 
 \subsubsection{Weighted Regulated Matrix-Factorization}
 A \textit{regulation factor} $\lambda$  is introduced in analogy to \textit{regulated matrix-factorization}. Additional \textit{weights} $\alpha$ and $\beta$ are introduced to take into account the individual magnitude of a vector. The \textit{minimization problem} then corresponds to $\min_{p_u, q_i}{\sum_{(u,i) \in \mathcal{B}} (r_{ui} - \hat{r}_{ui})^{2}} + \lambda(\alpha\lVert q_i \rVert^2 + \beta\lVert p_u \lVert^2)$.
 
-\subsubsection{Biased Matrix-Factorization}
+\subsubsection{Biased Matrix-Factorization}\label{subsec:bmf}
 A major advantage of \textit{matrix-factorization} is the ability to model simple relationships according to the application. Thus, an excellent data source cannot always be assumed. Due to the \textit{natural interaction} of the \textit{users} with the \textit{items}, \textit{preferences} arise. Such \textit{preferences} lead to \textit{behaviour patterns} which manifest themselves in the form of a \textit{bias} in the data. In principle, a \textit{bias} is not bad, but it must be taken into account when modeling the \textit{recommender system}.
 The most popular model that takes \textit{bias} into account is called \textit{biased matrix-factorization}.
 In addition, the \textit{missing rating} is no longer determined only by the \textit{inner product} of the two vectors $q_i$ and $p_u$. Rather, the \textit{bias} is also considered. Accordingly, a \textit{missing rating} is calculated by $\hat{r}_{ui} = b_{ui} + q_i^Tp_u$, where $b_{ui}$ is the \textit{bias} of a \textit{user} $u$ and an \textit{item} $i$. The \textit{bias} is determined by $b_{ui}=\mu + b_u + b_i$. The parameter $\mu$ is the \textit{global average} of all \textit{ratings} $r_{ui} \in \mathcal{R}$.
@@ -42,7 +42,7 @@ Here $\mu_u$ denotes the \textit{average} of all \textit{assigned ratings} of th
 Thus $b_u$ indicates the \textit{deviation} of the \textit{average assigned rating} of a \textit{user} from the \textit{global average}. Similarly, $b_i$ indicates the \textit{deviation} of the \textit{average rating} of an item from the \textit{global average}.
 In addition, the \textit{minimization problem} can be extended by the \textit{bias}. Accordingly, the \textit{minimization problem} is then $\min_{p_u, q_i}{\sum_{(u,i) \in \mathcal{B}} (r_{ui} - \hat{r}_{ui})^{2}} + \lambda(\lVert q_i \rVert^2 + \lVert p_u \lVert^2 + b_u^2 + b_i^2)$. Analogous to the \textit{regulated matrix-factorization}, the values $b_u$ and $b_i$ are penalized in addition to $\lVert q_i \rVert, \lVert p_u \rVert$. In this case $b_u, b_i$ are penalized more if they assume a large value and thus deviate strongly from the \textit{global average}.
 
-\subsubsection{Advanced Matrix-Factorization}
+\subsubsection{Advanced Matrix-Factorization}\label{subsec:amf}
 This section is intended to show that there are \textit{other approaches} to \textit{matrix-factorization}.
 Thus, \textit{implicit data} can also be included.
 First of all, it should be mentioned that \textit{temporary dynamics} can also be included.
@@ -53,4 +53,25 @@ As a second possibility, \textit{implicit influence} can be included. This can i
 An important point that does not emerge from the above points is the question of how the individual components $p_u, q_i, b_u, b_i$ are constructed. In the following, the three most common methods are presented.
 
 \subsubsection{Stochastic Gradient Descent}
-The best known and most common method when it comes to \textit{machine learning} is \textit{stochastic gradient descent (SGD)}. The goal of \textit{SGD} is to \textit{minimize} the \textit{error} of a given \textit{objective function}. Thus the estimators mentioned in section \ref{sec:mf} can be used as \textit{objective functions}. In the field of \textit{recommender systems}, \citet{Funk06} presented a \textit{modified} variant of \textit{SGD} in the context of the \textit{Netflix Challenge}. This can be applied to \textit{regulated matrix-factorization} with \textit{bias} as well as without \textit{bias}. This method can be described by the following pseudo code:
\ No newline at end of file
+The best known and most common method when it comes to \textit{machine learning} is \textit{stochastic gradient descent (SGD)}. The goal of \textit{SGD} is to \textit{minimize} the \textit{error} of a given \textit{objective function}. Thus the estimators mentioned in section \ref{sec:mf} can be used as \textit{objective functions}. In the field of \textit{recommender systems}, \citet{Funk06} presented a \textit{modified} variant of \textit{SGD} in the context of the \textit{Netflix Challenge}. \textit{SGD} can be applied to \textit{regulated matrix-factorization} with \textit{bias} as well as without \textit{bias}. This method can be described by the following pseudo code:
+\begin{algorithm}\label{alg:sgd}
+	\caption{SGD of Funk}
+	\begin{algorithmic}[1]
+		\REQUIRE training-matrix $\mathcal{R}_{train}$, initial mean $\mu$, initial standard deviation $\sigma^2$, regularization parameter $\lambda$, learning rate $\gamma$, feature embedding $f$, epochs to train $n_{epochs}$
+		\STATE $\mathcal{P} \leftarrow \mathcal{N}(\mu, \sigma^2)^{|\mathcal{U}|\times f}$
+		\STATE $\mathcal{Q} \leftarrow \mathcal{N}(\mu, \sigma^2)^{f\times |\mathcal{I}|}$
+		\FOR{$epoch \in \lbrace 0, \cdots, n_{epochs}-1\rbrace$}
+			\FOR{$(u,i) \in \mathcal{R}_{train}$}
+				\STATE $e_{ui} \leftarrow r_{ui} - \hat{r}_{ui}$
+				\STATE $q_i \leftarrow q_i + \gamma(e_{ui}p_u -\lambda q_i)$
+				\STATE $p_u \leftarrow p_u + \gamma(e_{ui}q_i - \lambda p_u)$
+				\STATE $b_i \leftarrow b_i + \gamma(e_{ui}-\lambda b_i)$
+				\STATE $b_u \leftarrow b_u + \gamma(e_{ui}-\lambda b_u)$
+			\ENDFOR
+		\ENDFOR
+		\RETURN $\mathcal{P}, \mathcal{Q}$
+	\end{algorithmic}
+\end{algorithm}
+
+At the beginning, the matrices $\mathcal{P}, \mathcal{Q}$ are filled with \textit{random numbers}. According to \citet{Funk06} this can be done by a \textit{gaussian-distribution}. Then, for each element in the \textit{training set}, the entries of the corresponding vectors $p_u \in \mathcal{P}, q_i \in \mathcal{Q}$ are recalculated on the basis of the \textit{error} that occurred in an \textit{epoch}. The parameters $\mu, \gamma$ are introduced to avoid \textit{over}- and \textit{underfitting}. These can be determined using \textit{grid-search} and \textit{k-fold cross-validation}. For the \textit{optimization} of the parameters $\mu$ and $\gamma$ the so-called \textit{grid-search} procedure is used. A \textit{grid} of possible parameters is defined before the analysis. This \textit{grid} consists of the sets $\Lambda$ and $\Gamma$. The \textit{grid-search} method then trains the algorithm to be considered with each possible pair of $(\lambda \in \Lambda, \gamma \in \Gamma)$. The models trained in this way are then tested using a \textit{k-fold cross-validation}. The data set is divided into $k$-equally large fragments. Each of the $k$ parts is used once as a test set while the remaining ($k-1)$ parts are used as training data. The average error is then determined via the $k$-\textit{folds} and entered into the \textit{grid}. Thus the pair $(\lambda \in \Lambda, \gamma \in \Gamma)$ can be determined for which the \textit{error} is lowest.
+This approach is also called \textit{Funk-SVD} or \textit{SVD} in combination with section \ref{subsec:rmf} and \ref{subsec:bmf} \citep{Rendle19}.  The algorithm shown above can also be extended. Thus procedures like in section \ref{subsec:amf} can be solved. The second method from section \ref{subsec:amf} is then also called \textit{SVD++}. A coherent \textit{SGD} approach was given by \citet{Kor11}.
\ No newline at end of file
diff --git a/submission.pdf b/submission.pdf
index bec9a7f5524dbafdf42b131cc3625dd70b574300..63d1731ea1395a28fd0ca046a3b0fb6aca4ffee8 100644
Binary files a/submission.pdf and b/submission.pdf differ