Learning good, compact representations off RGB observations of a latte-art making task