Skip to content

Latest commit

 

History

History
64 lines (49 loc) · 6.98 KB

20160607.md

File metadata and controls

64 lines (49 loc) · 6.98 KB

2016-06-07

good photo vs. bad photo

  1. automatic image tagging technologies are helping tell the stories behind photographs by automatically indexing or tagging them to make them discoverable.
  2. visual aesthetics. how the visual style and composition of an image create an emotional connection with the viewer.
  3. With photography being an artistic medium, it's extremly difficult and nearly impossible to break down aesthetics as a hard set of rules. Our vocabulary for defining what makes a good photo is very limited and most often boils down to personal taste of describing a photo as "good" or "not so good.
  4. To make a computer understand aesthetics in photographs, we train it with a dataset. Understanding and appreciating aesthetics is quite an expert-level task. For this reason, our researchers and photo curators closely collaborated to develop our training data.
  5. The goal behind that approach is to leverage expert opinions over a much larger scale, using more easily available data for tasks like visual aesthetics which require an enormous amount of human intellectual and emotional judgement.
  6. As a child I waited anxiously for the arrival of each new issue of National geogarphic Magazien. The magazine had amazing stories from around the world, but the stunningly beautiful photographs were more important to me. `The colors, shadows and composition intrigued and wowed me, and there was a cohesion of visual arrangement and storytelling.
  7. There are no rules for good photographs, there are only good photographs.
  8. The question then to ask is what do the master photographers' images have in common, and what separates them from the amateur images? While it's difficult for a computer to answer a philosophical question, if we express the question in mathematical terms we can attempt to solve it computationally. In our case, we aim to find a mathematical space in which the distance between the wo images that are aesthetically pleasing is less than the distance between an aesthetically pleasing one and a mediocre one. Embedding photographs in a mathematical space in which aesthetically peleasing phtos are nearby and far from less pleasing photos.
  9. Photographs can remain perceptually and conceptually unperturbed by small changes in their dimensions, colors and intensity patterns, and it's often not clear what changes contribute to a change in the actual information contained a photo. Oftern our understanding of a picture is influenced heavily by our previous visual stimuli. A much stronger strategy is to transform the image via vairous mathematical operators into a set of measurements called image features that summarize the visual information that is relevant to our solution, and disregard irrelvant information.
  10. $$loss = max(0, c + dist(/phi(I_1, /phi(I_2) - dist(/phi(I_1), /phi(I_3))$$ Researchers have explored such loss functions in image similarity, metric and face identification setting with greate success in the past. In a large context, such energy functions fall into the class of implicit regression, where the loss function penalizes the constraint that input vairables must satisfy.
  11. During training, we pass each image of the triplet trough a separate convolutional neural network. We eperimented with various CNN architectures. An 11 layer Oxford model was sufficient for godd convergence and performance. Though there is no direct theoretical motivation, we enforced the weights of the three networks to be the same, since we found this mad the networks converge easier. Further, at run time. this allowed us to use a single convoluional network, thus keeping our run time cost low.
  12. TrainSet: Over 100,000 images were categorized. use more than 60 million such triples for training the final model over 20,000 iteratations. We trained our networks using a single K40 card which converged in 2 to 3 days.
  13. As a photography-first company, we validate our assumptions with our internal curators and reviewers, who spend the major part of their working day curating photgrapgs. We track the time they spend on curation tasks where the image list is prioritized via the aesthetic rank vs default sort order. Using deep learning, we have been able to reduce curation time by 80%.
  14. Our initial deep learning model using the triple hingle loss function enables a first step in capturing some of the intricacies of these tasks.

Face Application

  1. OpenFace is a very good resource for face application.
  • Detect faces with a pre-trained models from dlib or OpenCV
  • Transform the face for the neural network. This repository uses dlib's real time pose estimation with OpenCV's affine transformation to try to make the eyes and bottom lip appear in the same location on each image.
  • Use a deep nerual network to represent (or embed) the face on a 128-dimensional unit hypersphere. the embedding is a generic representation for anybody's face. Unlike other face representations, this embedding has th nice property that a large distance between two face embeddings means that the faces are likely not of the same person. This property makes clustering, similarity detection, and classification tasks easier than other face recognition techniques where the Euclidean distance between features is not meaningful.
  • Apply your favorite clustering or classification techniques to the features to complete your recognition task.

Extract feautre with Caffe

  1. use extract_feature.bin
  2. Reading LMDB data from Python is very easy: http://lmdb.readthedocs.org/en/release/

Python

  1. tab default: equls 8 space

Machine learning software

  1. OpenCV ml module
  2. Python scikit-learn

Makefile

CXX = g++
CXX_FLAG = -std=c++11

Hinge loss

  1. In machine learning, the hinge loss is a loss function used for training classifiers.The honge loss is used for "maximum-margin" classification, most notably for support vector machines.

how to read caffe

  1. Having a good intuition of what the program does and does not do is a good first step to being able to understand the source code, rather than figuring out what the executable is supposed to do just given the code.
  2. Caffe is written mostly in C++, with python/MATLAB as scripting interfaces and some GPU kernels written in CUDA C/C++. For the most part you only need to worry about the C++ parts.
  • Understand what the Makefiles are doing. Running make actally builds serveral executables - caffe being one of them, but each of the cpp files in "caffe/tools" contains it's own "main" function.
  • Stepping it through it line by line. You can do this with lldb/gdb, but I personally prefere to compile it using QT Creator or Eclipse (both support importing Makefile projects).

DataSet

  1. Kaggle Dog vs. Cat

Samba

  1. Samba is the standard Windows interoperablity suite of programs for Linux and Unix.
  2. Samba is an important component to seamlessly integrate Linux/Unix Servers and Desktops into Active Directory environents. It can function both as a domain controller or as a regular domain member.