{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Face detection and recognition inference pipeline\n", "\n", "The following example illustrates how to use the `facenet_pytorch` python package to perform face detection and recogition on an image dataset using an Inception Resnet V1 pretrained on the VGGFace2 dataset.\n", "\n", "The following Pytorch methods are included:\n", "* Datasets\n", "* Dataloaders\n", "* GPU/CPU processing" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from facenet_pytorch import MTCNN, InceptionResnetV1\n", "import torch\n", "from torch.utils.data import DataLoader\n", "from torchvision import datasets\n", "import numpy as np\n", "import pandas as pd\n", "import os\n", "\n", "workers = 0 if os.name == 'nt' else 4" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Determine if an nvidia GPU is available" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Running on device: cuda:0\n" ] } ], "source": [ "device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')\n", "print('Running on device: {}'.format(device))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Define MTCNN module\n", "\n", "Default params shown for illustration, but not needed. Note that, since MTCNN is a collection of neural nets and other code, the device must be passed in the following way to enable copying of objects when needed internally.\n", "\n", "See `help(MTCNN)` for more details." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "mtcnn = MTCNN(\n", " image_size=160, margin=0, min_face_size=20,\n", " thresholds=[0.6, 0.7, 0.7], factor=0.709, post_process=True,\n", " device=device\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Define Inception Resnet V1 module\n", "\n", "Set classify=True for pretrained classifier. For this example, we will use the model to output embeddings/CNN features. Note that for inference, it is important to set the model to `eval` mode.\n", "\n", "See `help(InceptionResnetV1)` for more details." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "resnet = InceptionResnetV1(pretrained='vggface2').eval().to(device)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Define a dataset and data loader\n", "\n", "We add the `idx_to_class` attribute to the dataset to enable easy recoding of label indices to identity names later one." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "def collate_fn(x):\n", " return x[0]\n", "\n", "dataset = datasets.ImageFolder('../data/test_images')\n", "dataset.idx_to_class = {i:c for c, i in dataset.class_to_idx.items()}\n", "loader = DataLoader(dataset, collate_fn=collate_fn, num_workers=workers)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Perfom MTCNN facial detection\n", "\n", "Iterate through the DataLoader object and detect faces and associated detection probabilities for each. The `MTCNN` forward method returns images cropped to the detected face, if a face was detected. By default only a single detected face is returned - to have `MTCNN` return all detected faces, set `keep_all=True` when creating the MTCNN object above.\n", "\n", "To obtain bounding boxes rather than cropped face images, you can instead call the lower-level `mtcnn.detect()` function. See `help(mtcnn.detect)` for details." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Face detected with probability: 0.999957\n", "Face detected with probability: 0.999927\n", "Face detected with probability: 0.999662\n", "Face detected with probability: 0.999873\n", "Face detected with probability: 0.999991\n" ] } ], "source": [ "aligned = []\n", "names = []\n", "for x, y in loader:\n", " x_aligned, prob = mtcnn(x, return_prob=True)\n", " if x_aligned is not None:\n", " print('Face detected with probability: {:8f}'.format(prob))\n", " aligned.append(x_aligned)\n", " names.append(dataset.idx_to_class[y])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Calculate image embeddings\n", "\n", "MTCNN will return images of faces all the same size, enabling easy batch processing with the Resnet recognition module. Here, since we only have a few images, we build a single batch and perform inference on it. \n", "\n", "For real datasets, code should be modified to control batch sizes being passed to the Resnet, particularly if being processed on a GPU. For repeated testing, it is best to separate face detection (using MTCNN) from embedding or classification (using InceptionResnetV1), as calculation of cropped faces or bounding boxes can then be performed a single time and detected faces saved for future use." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "aligned = torch.stack(aligned).to(device)\n", "embeddings = resnet(aligned).detach().cpu()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Print distance matrix for classes" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " angelina_jolie bradley_cooper kate_siegel paul_rudd \\\n", "angelina_jolie 0.000000 1.344806 0.781201 1.425579 \n", "bradley_cooper 1.344806 0.000000 1.256238 0.922126 \n", "kate_siegel 0.781201 1.256238 0.000000 1.366423 \n", "paul_rudd 1.425579 0.922126 1.366423 0.000000 \n", "shea_whigham 1.448495 0.891145 1.416447 0.985438 \n", "\n", " shea_whigham \n", "angelina_jolie 1.448495 \n", "bradley_cooper 0.891145 \n", "kate_siegel 1.416447 \n", "paul_rudd 0.985438 \n", "shea_whigham 0.000000 \n" ] } ], "source": [ "dists = [[(e1 - e2).norm().item() for e2 in embeddings] for e1 in embeddings]\n", "print(pd.DataFrame(dists, columns=names, index=names))" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 2 }