Update the structure of the package

odlgroup · Dec 5, 2017 · 1e4d1f7 · 1e4d1f7
1 parent f1cc02f
commit 1e4d1f7
Show file tree

Hide file tree

Showing 7 changed files with 230 additions and 48 deletions.
diff --git a/code/install.ipynb → code/part0_install.ipynb b/code/install.ipynb → code/part0_install.ipynb
diff --git a/code/Exercises_Part1.ipynb → code/part1_exercises.ipynb b/code/Exercises_Part1.ipynb → code/part1_exercises.ipynb
diff --git a/TV_denoising_with_PDHG.ipynb → code/part1_tv_denoising.ipynb b/TV_denoising_with_PDHG.ipynb → code/part1_tv_denoising.ipynb
diff --git a/code/Using_a_different_solver.ipynb → code/part1_using_a_different_solver.ipynb b/code/Using_a_different_solver.ipynb → code/part1_using_a_different_solver.ipynb
diff --git a/code/classify_mnist.ipynb → code/part2_classification.ipynb b/code/classify_mnist.ipynb → code/part2_classification.ipynb
@@ -46,6 +46,7 @@
    "cell_type": "code",
    "execution_count": null,
    "metadata": {
+    "collapsed": true,
     "scrolled": true
    },
    "outputs": [],
@@ -94,6 +95,7 @@
    "cell_type": "code",
    "execution_count": null,
    "metadata": {
+    "collapsed": true,
     "scrolled": true
    },
    "outputs": [],
@@ -116,6 +118,7 @@
    "cell_type": "code",
    "execution_count": null,
    "metadata": {
+    "collapsed": true,
     "scrolled": false
    },
    "outputs": [],
@@ -165,28 +168,6 @@
     "    return np.mean(result == test_labels)"
    ]
   },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Create placeholders. Placeholders are needed in tensorflow since tensorflow is a lazy language,\n",
-    "and hence we first define the computational graph with placeholders as input, and later we evaluate it."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "collapsed": true,
-    "scrolled": true
-   },
-   "outputs": [],
-   "source": [
-    "with tf.name_scope('placeholders'):\n",
-    "    images = tf.placeholder(tf.float32, shape=[None, 28, 28, 1])\n",
-    "    true_labels = tf.placeholder(tf.int32, shape=[None])"
-   ]
-  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -208,7 +189,9 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {},
+   "metadata": {
+    "collapsed": true
+   },
    "outputs": [],
    "source": [
     "toh = tf.one_hot([0, 1, 2], depth=3)\n",
@@ -268,25 +251,40 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {},
+   "metadata": {
+    "collapsed": true
+   },
    "outputs": [],
    "source": [
-    "with tf.name_scope('elementary'):\n",
+    "with tf.name_scope('elementary_network'):\n",
+    "    # Create a placeholder for our input data (no computation is done here)\n",
     "    X = tf.placeholder(shape=(None, 784), dtype=tf.float32, name=\"X\")\n",
+    "    \n",
+    "    # Create the parameters (weight, bias) of the model\n",
     "    weights = tf.Variable(tf.random_normal((784, 10)), name=\"weights\")\n",
     "    bias = tf.Variable(tf.zeros((10)), name=\"bias\")\n",
-    "    lin = tf.matmul(X, weights)\n",
-    "    lin_ = lin + bias\n",
-    "    elin_ = tf.exp(lin_)\n",
-    "    Z = tf.reduce_sum(tf.exp(lin_), axis=1, keep_dims=True)\n",
-    "    prob = elin_ / Z\n",
+    "    \n",
+    "    # Compute the probabilities (this is all lazy, no computations are actually performed)\n",
+    "    lin = tf.matmul(X, weights) + bias\n",
+    "    elin = tf.exp(lin)\n",
+    "    Z = tf.reduce_sum(elin, axis=1, keep_dims=True)\n",
+    "    prob = elin / Z\n",
     "    log_prob = tf.log(prob)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Define the loss function which measures how good our parameters are"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {},
+   "metadata": {
+    "collapsed": true
+   },
    "outputs": [],
    "source": [
     "with tf.name_scope(\"elementary_loss\"):\n",
@@ -295,40 +293,73 @@
     "    loss = -tf.reduce_mean(determ*log_prob)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Define the gradient descent update step, i.e.\n",
+    "\n",
+    "$$w_i \\leftarrow w_i - \\omega \\nabla_{w_i} L(w, b)$$\n",
+    "\n",
+    "where $\\omega$ is the *learning rate*, or step size.\n",
+    "\n",
+    "Note that in machine learning, we typically use *stochastic* gradient descent (SGD). In these methods we don't use all of the data to compute the gradient, only a small subset called a mini-batch. Here we use 128 images in each training step.\n",
+    "\n",
+    "Further, while for this case computing the gradient would be quite simple, once we move to harder and mroe complicated models doing so would be basically impossible to do by hand. To work around this, all major deep learning frameworks implement [automatic differentiation](https://en.wikipedia.org/wiki/Automatic_differentiation). This may sound fancy, but automatic differentiation is simply the chain rule for the derivative. Tensorflow implements it using the `tf.gradients` command."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {},
+   "metadata": {
+    "collapsed": true
+   },
    "outputs": [],
    "source": [
-    "with tf.name_scope(\"elementary_training\"):    \n",
+    "with tf.name_scope(\"elementary_training\"):\n",
     "    learning_rate = .1\n",
-    "    batch_size = 2**7\n",
+    "    batch_size = 128\n",
     "\n",
     "    variables = [weights, bias]\n",
     "    gradients = tf.gradients(loss, variables)\n",
     "    update_ops = [var.assign(var - learning_rate*grad) \n",
     "                  for var, grad in zip(variables, gradients)]"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Since all the code above was lazy, nothing has actually happened. Before we start we need to initialize the variables"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {},
+   "metadata": {
+    "collapsed": true
+   },
    "outputs": [],
    "source": [
     "init = tf.global_variables_initializer().run()"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We train the network by feeding data from the training set and occationally evalute the performance on our test set, this is the first point we actually start doing computations"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
    "metadata": {
+    "collapsed": true,
     "scrolled": false
    },
    "outputs": [],
    "source": [
-    "feed_dict={labels:mnist.train.labels[:batch_size], X:mnist.train.images[:batch_size]}\n",
     "for i in range(100000):\n",
     "    images_, labels_ = mnist.train.next_batch(batch_size)\n",
     "    session.run(update_ops, \n",
@@ -342,33 +373,115 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### Using TensorFlow libraries"
+    "### Using TensorFlow libraries\n",
+    "\n",
+    "While the above code solves our problem, it involved several small and perhaps obscure steps. Once we start moving to more complicated neural networks the code would become very repetetive.\n",
+    "\n",
+    "Since all of the steps are standardized, we can (and should) instead use built in tensorflow functions, this example does that, and all following examples will do the same."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Placeholders"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "metadata": {
+    "collapsed": true,
     "scrolled": true
    },
    "outputs": [],
+   "source": [
+    "with tf.name_scope('placeholders'):\n",
+    "    images = tf.placeholder(tf.float32, shape=[None, 28, 28, 1])\n",
+    "    true_labels = tf.placeholder(tf.int32, shape=[None])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Network\n",
+    "\n",
+    "The \"network\" can be computed using the `tf.contrib.layers.fully_connected` function, which computes\n",
+    "\n",
+    "$$\\rho(Ax + b)$$\n",
+    "\n",
+    "where $\\rho$ is the activation function, $A$ the weights and $b$ the bias. Note that here we never explicitly construct these, they are hidden inside tensorflow."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
    "source": [
     "with tf.name_scope('logistic_regression'):\n",
     "    x = tf.contrib.layers.flatten(images)\n",
     "    logits = tf.contrib.layers.fully_connected(x, 10,\n",
-    "                                               activation_fn=None)\n",
-    "    pred = tf.argmax(logits, axis=1)\n",
-    "    \n",
+    "                                               activation_fn=None)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Loss and optimization\n",
+    "\n",
+    "The loss function defined above should be done using the `tf.nn.softmax_cross_entropy_with_logits` function, which not only is easier to use, it is also more numerically stable"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
     "with tf.name_scope('optimizer'):\n",
     "    one_hot_labels = tf.one_hot(true_labels, depth=10)\n",
-    "    \n",
     "    loss = tf.nn.softmax_cross_entropy_with_logits(labels=one_hot_labels,\n",
     "                                                   logits=logits)\n",
-    "    optimizer = tf.train.AdamOptimizer().minimize(loss)\n",
+    "    optimizer = tf.train.AdamOptimizer().minimize(loss)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "session.run(tf.global_variables_initializer())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Train the network\n",
     "\n",
+    "Training the network looks about the same as above"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true,
+    "scrolled": true
+   },
+   "outputs": [],
+   "source": [
     "# Initialize all TF variables\n",
-    "session.run(tf.global_variables_initializer())\n",
-    "\n",
     "for i in range(10000):\n",
     "    batch = mnist.train.next_batch(128)\n",
     "    train_images = batch[0].reshape([-1, 28, 28, 1])\n",
@@ -377,9 +490,8 @@
     "    session.run(optimizer, feed_dict={images: train_images, \n",
     "                                      true_labels: train_labels})\n",
     "\n",
-    "    if i % 100 == 0:\n",
-    "        print('{} Average correct: {}'.format(\n",
-    "                i, evaluate(pred, images)))"
+    "    if i % 1000 == 0:\n",
+    "        print(\"{:.1f}%, \".format(evaluate(tf.argmax(logits, axis=1), X)*100), end=\"\")"
    ]
   },
   {
@@ -401,6 +513,7 @@
    "cell_type": "code",
    "execution_count": null,
    "metadata": {
+    "collapsed": true,
     "scrolled": false
    },
    "outputs": [],
@@ -451,6 +564,7 @@
    "cell_type": "code",
    "execution_count": null,
    "metadata": {
+    "collapsed": true,
     "scrolled": true
    },
    "outputs": [],
@@ -505,7 +619,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.6.1"
+   "version": "3.5.3"
   }
  },
  "nbformat": 4,