Skip to content
Snippets Groups Projects
Commit 67429e29 authored by Marc Feger's avatar Marc Feger
Browse files

Add more text for experiments

parent 2c183bd0
No related branches found
No related tags found
No related merge requests found
......@@ -3,6 +3,16 @@ This section reviews the \textit{main part} of the work represented by \citet{Re
\subsection{Motivation and Background}
As in many other fields of \textit{data-science}, a valid \textit{benchmark-dataset} is required for a proper execution of experiments. In the field of \textit{recommender systems}, the best known \textit{datasets} are the \textit{Netflix-} and \textit{MovieLens-dataset}. This section introduces both \textit{datasets} and shows the relationship of \citet{Koren}, one of the authors of this paper, to the \textit{Netflix-Prize}, in addition to the existing \textit{baselines}.
\subsubsection{Netflix-Prize}
The topic of \textit{recommender systems} was first properly promoted and made known by the \textit{Netflix-Prize}. On \textit{October 2nd 2006}, the competition announced by \textit{Netflix} began with the \textit{goal} of beating the self-developed \textit{recommender system Cinematch} with an \textit{RMSE} of \textit{0.9514} by at least \textit{10\%}.
In total, the \textit{Netflix-dataset} was divided into three parts that can be grouped into two categories: \textit{training} and \textit{qualification}. In addition to a \textit{probe-dataset} for \textit{training} the algorithms, two further datasets were retained to qualify the winners. The \textit{quiz-dataset} was then used to calculate the \textit{score} of the \textit{submitted solutions} on the \textit{public leaderboard}. In contrast, the \textit{test-dataset} was used to determine the \textit{actual winners}. Each of the pieces had around \textit{1.408.000 data} and \textit{similar statistical values}. By splitting the data in this way, it was possible to ensure that an improvement could not be achieved by \textit{simple hill-climbing-algorithms}.
It took a total of \textit{three years} and \textit{several hundred models} until the team \textit{"BellKor`s Pragmatic Chaos"} was chosen as the \textit{winner} on \textit{21st September 2009}. They had managed to achieve an \textit{RMSE} of \textit{0.8554} and thus an \textit{improvement} of \textit{0.096}. Such a result is extraordinary excellent, because it took \textit{one year} of work and intensive research to reduce the \textit{RMSE} from \textit{0.8712 (progress award 2007)} to \textit{0.8616 (progress award 2008)}.
The \textit{co-author} of the present paper, \citet{Koren}, was significantly involved in the work of this team. Since the beginning of the event, \textit{matrix-factorization methods} have been regarded as promising approaches. Even with the simplest \textit{SVD} methods, \textit{RMSE values} of \textit{0.94} could be achieved by \citet{Kurucz07}.
The \textit{breakthrough} came through \citet{Funk06} who achieved an \textit{RMSE} of \textit{0.93} with his \textit{FunkSVD}.
Based on this, more and more work has been invested in the research of simple \textit{matrix-factorization methods}.
Thus, \citet{Zh08} presented an \textit{ALS variant} with an \textit{RMSE} of \textit{0.8985} and \citet{Koren09} presented an \textit{SGD variant} with \textit{RMSE 0.8995}.
\textit{Implicit data} were also used. For example, \citet{Koren09} could also achieve an \textit{RMSE} of \textit{0.8762} by extending \textit{SVD++} with a \textit{time variable}. This was then called \textit{timeSVD++}.
The \textit{Netflix-Prize} made it clear that even the \textit{simplest methods} are \textit{not trivial} and that a \textit{reasonable investigation} and \textit{evaluation requires} an \textit{immense effort} from within the \textit{community}.
\subsubsection{MovieLens}
\subsection{Experiment Realization}
\subsubsection{Experiment Preparation}
......
......@@ -123,3 +123,18 @@ doi = {10.1145/1390156.1390267}
howpublished = {\url{https://ieeexplore.ieee.org/author/37414256700}},
note = {Accessed: 2019-12-21},
}
@article{Kurucz07,
author = {Miklós Kurucz and András Benczúr and Károly Csalogány},
year = {2007},
month = {01},
pages = {},
title = {Methods for large scale SVD with missing values},
journal = {ACM KDDCup 2007}
}
@article{Koren09,
author = {Yehuda Koren},
year = {2009},
month = {09},
pages = {},
title = {The BellKor solution to the Netflix Grand Prize}
}
\ No newline at end of file
No preview for this file type
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment