Are you happy with your logging solution? Would you help us out by taking a 30-second survey? Click here


Practise of DeepID for Face Classification

Subscribe to updates I use DeepID_FaceClassify

Statistics on DeepID_FaceClassify

Number of watchers on Github 412
Number of open issues 8
Average time to close an issue 1 day
Main language Python
Open pull requests 0+
Closed pull requests 0+
Last commit over 4 years ago
Repo Created over 5 years ago
Repo Last Updated over 1 year ago
Size 381 KB
Organization / Authorstdcoutzyx
Page Updated
Do you use DeepID_FaceClassify? Leave a review!
View open issues (8)
View DeepID_FaceClassify activity
View on github
Fresh, new opensource launches πŸš€πŸš€πŸš€
Trendy new open source projects in your inbox! View examples

Subscribe to our mailing list

Evaluating DeepID_FaceClassify for your project? Score Explanation
Commits Score (?)
Issues & PR Score (?)


Implementation of DeepID using theano.

u can also see a Chinese version of this doc in chinese version, or u can see that in my blog.



You have to install theano and related libs. There are enough information in the theano document. So i will assume all of the readers have installed theano correctly.

Implemented Programmes

The structure of my code look like that:


Just as the names of the folders imply, there are two modules without reference to each other in my code. The data_prepare module is used to prepare data. And the conv_net module is the implemention of DeepID.

Data Preparation


There are two parts which are important and neccessary for the amazing performance of DeepID, namely the net structure of the Convolutional Neural Network and the data.

I had asked the author for the data, got nothing but a polite reply. So in my experiment, some other data are used instead.

Take the youtube face data as an example. There are three levels of folders, which is showed below:


The first thing need to be done is to seperate the data into train set and validate set. The way i choose train set and validate set is as below:

  • Mix the imgs of the same person but different videos together.
  • Random shuffle
  • Choose first 5 imgs as validate set.
  • Choose the 5th to 25th imgs as the train set.

At last, i get 7975 imgs as the validation set and 31900 imgs as the train set. Obviously, you will know that there are 1595 classes(persons) totally.

Usage of Code

Note: the file prefixed with youtube is specifically for the youtube data because of the folder structure and the img property. So if you want to deal with some other dataset, please read the code of * and * and re-implement them. I believe the code is readable and easy to understand for the readers.

Used to get the face out of the img. Face in youtube data has been aligned into the center of the img. So this programme aims to increase the ratio of the face in the whole img and resize the img into (47,55), which is the input size for the DeepID.

Usage: python aligned_db_folder new_folder
  • aligned_db_folder: source folder
  • new_folder: The programme will generate the whole folder structure the same as the source folder, with all the imgs are processed into new size.

Used to split data into two set, One is for train and one is for valid.

Usage: python src_folder test_set_file train_set_file

The format of test_set_file and train_set_file is like below. There are two parts in one line, the first is path of the img, the second is label of the img.


Used to vectorize the imgs. To make the thousands of imgs into a two-d array, whose size is (m,n). m is the number of samples, n is the 47553.

To avoid occurance of super big file, automatically seperate data into batches with 1000 samples in each batch.

Usage: python test_set_file train_set_file test_vector_folder train_vector_folder
  • test_set_file: generated by *
  • train_set_file: generated by *
  • test_vector_folder: the folder name to store the vector files of validate set
  • train_vector_folder: the folder name to store the vector files of train set



Now it's the exciting time.

In the conv_net module, there are five programme files.

  • definition of different types of layer, including LogisticRegression, HiddenLayer, LeNetConvLayer, PoolLayer and LeNetConvPoolLayer.
  • load data for the executive programme.
  • some test function to validate the corrective of layers defined in
  • DeepID main programme.
  • get the Hidden Layer used the trained parameters.

Usage of Code

Usage: python vec_valid vec_train params_file
  • vec_valid: generated by
  • vec_train: generated by
  • params_file: to store the trained parameters of all iterations. It can be used if your computer come across unexpected shutdown. And it can be used to extract the hidden layer of the net.

Note: there are so many parameters need to be adjusted for DeepID, so i did not show them directly in the command line for the simple use of my code. If you want to change the epoch num, learning rate, batch size and so on, please change them in the last line of the file.

You can extract the hidden layer whose dimension is 160 with command below:

Usage: python dataset_folder params_file result_folder
  • dataset_folder: it can be the folder of train set or valid set.
  • params_file: trained by
  • result_folder: include files whose name are the same as in the dataset_folder, but the dimension of x in each file will be num_sample160 instead of num_samples7755.


DeepID performance

After running the, you will get the output of the programme like that. The first part is the train error and valid error of each epoch, The second part is the summarization of the epoch, train error, valid error.

epoch 15, train_score 0.000444, valid_score 0.066000
        epoch 16, minibatch_index 62/63, error 0.000000
epoch 16, train_score 0.000413, valid_score 0.065733
        epoch 17, minibatch_index 62/63, error 0.000000
epoch 17, train_score 0.000508, valid_score 0.065333
        epoch 18, minibatch_index 62/63, error 0.000000
epoch 18, train_score 0.000413, valid_score 0.070267
        epoch 19, minibatch_index 62/63, error 0.000000
epoch 19, train_score 0.000413, valid_score 0.064533

0 0.974349206349 0.962933333333
1 0.890095238095 0.897466666667
2 0.70126984127 0.666666666667
3 0.392031746032 0.520133333333
4 0.187619047619 0.360666666667
5 0.20526984127 0.22
6 0.054380952381 0.171066666667
7 0.0154920634921 0.128
8 0.00650793650794 0.100133333333
9 0.00377777777778 0.0909333333333
10 0.00292063492063 0.086
11 0.0015873015873 0.0792
12 0.00133333333333 0.0754666666667
13 0.00111111111111 0.0714666666667
14 0.000761904761905 0.068
15 0.000444444444444 0.066
16 0.000412698412698 0.0657333333333
17 0.000507936507937 0.0653333333333
18 0.000412698412698 0.0702666666667
19 0.000412698412698 0.0645333333333

You can also put the second part of the output into a figure with matplotlib.

deepid on youtube

Generated Feature performance

After running, you will get output like below:

loading data of vec_test/0.pkl
    building the model ...
    generating ...
    writing data to deepid_test/0.pkl
loading data of vec_test/3.pkl
    building the model ...
    generating ...
    writing data to deepid_test/3.pkl
loading data of vec_test/1.pkl
    building the model ...
    generating ...
    writing data to deepid_test/1.pkl
loading data of vec_test/7.pkl
    building the model ...
    generating ...
    writing data to deepid_test/7.pkl

The programme will extract on each sub file of the vectorized data.

After extracting the hidden layer, we can do some other things to prove the effiency of the deepid feature. For example, in the domain of feature retrieval, you can use my another github project to test on the data generated in this project, here is the link.

For comparison, i have done two experiments on the youtube face data for face retrieval.

  • PCA exp. Reduce feature to 160-d on data generated by, and do face retrieval exp on that
  • DeepID exp. Do face retrieval exp directly on the data generated by

Note: In both experiments, i use the cosine distance to measure the similarity of two vectors.

Results of face retrieval are below:

Precision Top-1 Top-5 Top-10
PCA 95.20% 96.75% 97.22%
DeepID 97.27% 97.93% 98.25%
AP Top-1 Top-5 Top-10
PCA 95.20% 84.19% 70.66%
DeepID 97.27% 89.22% 76.64%

Precision means if there is a photo who has the same people with the query image in the top-N results, it's correct. But AP will calculate how many photos who has the same people with the query image in the top-N results.

From the results, we can know the DeepID feature is superior to the pca method with the equal dimension.


[1]. Sun Y, Wang X, Tang X. Deep learning face representation from predicting 10,000 classes[C]//Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on. IEEE, 2014: 1891-1898.

DeepID_FaceClassify open issues Ask a question     (View All Issues)
  • over 3 years question about feature
  • over 3 years questions about cos distance
  • almost 4 years ask for help
  • about 4 years How can i contact you ?
  • over 4 years Why i use these code in lfw dataset,but got 90%+ error
DeepID_FaceClassify list of languages used
Other projects in Python