smart-interactive-display/Assets/StreamingAssets/MergeFace/Facenet/examples/infer.ipynb

245 lines
7.3 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Face detection and recognition inference pipeline\n",
"\n",
"The following example illustrates how to use the `facenet_pytorch` python package to perform face detection and recogition on an image dataset using an Inception Resnet V1 pretrained on the VGGFace2 dataset.\n",
"\n",
"The following Pytorch methods are included:\n",
"* Datasets\n",
"* Dataloaders\n",
"* GPU/CPU processing"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"from facenet_pytorch import MTCNN, InceptionResnetV1\n",
"import torch\n",
"from torch.utils.data import DataLoader\n",
"from torchvision import datasets\n",
"import numpy as np\n",
"import pandas as pd\n",
"import os\n",
"\n",
"workers = 0 if os.name == 'nt' else 4"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Determine if an nvidia GPU is available"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Running on device: cuda:0\n"
]
}
],
"source": [
"device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')\n",
"print('Running on device: {}'.format(device))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Define MTCNN module\n",
"\n",
"Default params shown for illustration, but not needed. Note that, since MTCNN is a collection of neural nets and other code, the device must be passed in the following way to enable copying of objects when needed internally.\n",
"\n",
"See `help(MTCNN)` for more details."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"mtcnn = MTCNN(\n",
" image_size=160, margin=0, min_face_size=20,\n",
" thresholds=[0.6, 0.7, 0.7], factor=0.709, post_process=True,\n",
" device=device\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Define Inception Resnet V1 module\n",
"\n",
"Set classify=True for pretrained classifier. For this example, we will use the model to output embeddings/CNN features. Note that for inference, it is important to set the model to `eval` mode.\n",
"\n",
"See `help(InceptionResnetV1)` for more details."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"resnet = InceptionResnetV1(pretrained='vggface2').eval().to(device)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Define a dataset and data loader\n",
"\n",
"We add the `idx_to_class` attribute to the dataset to enable easy recoding of label indices to identity names later one."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"def collate_fn(x):\n",
" return x[0]\n",
"\n",
"dataset = datasets.ImageFolder('../data/test_images')\n",
"dataset.idx_to_class = {i:c for c, i in dataset.class_to_idx.items()}\n",
"loader = DataLoader(dataset, collate_fn=collate_fn, num_workers=workers)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Perfom MTCNN facial detection\n",
"\n",
"Iterate through the DataLoader object and detect faces and associated detection probabilities for each. The `MTCNN` forward method returns images cropped to the detected face, if a face was detected. By default only a single detected face is returned - to have `MTCNN` return all detected faces, set `keep_all=True` when creating the MTCNN object above.\n",
"\n",
"To obtain bounding boxes rather than cropped face images, you can instead call the lower-level `mtcnn.detect()` function. See `help(mtcnn.detect)` for details."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Face detected with probability: 0.999957\n",
"Face detected with probability: 0.999927\n",
"Face detected with probability: 0.999662\n",
"Face detected with probability: 0.999873\n",
"Face detected with probability: 0.999991\n"
]
}
],
"source": [
"aligned = []\n",
"names = []\n",
"for x, y in loader:\n",
" x_aligned, prob = mtcnn(x, return_prob=True)\n",
" if x_aligned is not None:\n",
" print('Face detected with probability: {:8f}'.format(prob))\n",
" aligned.append(x_aligned)\n",
" names.append(dataset.idx_to_class[y])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Calculate image embeddings\n",
"\n",
"MTCNN will return images of faces all the same size, enabling easy batch processing with the Resnet recognition module. Here, since we only have a few images, we build a single batch and perform inference on it. \n",
"\n",
"For real datasets, code should be modified to control batch sizes being passed to the Resnet, particularly if being processed on a GPU. For repeated testing, it is best to separate face detection (using MTCNN) from embedding or classification (using InceptionResnetV1), as calculation of cropped faces or bounding boxes can then be performed a single time and detected faces saved for future use."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"aligned = torch.stack(aligned).to(device)\n",
"embeddings = resnet(aligned).detach().cpu()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Print distance matrix for classes"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" angelina_jolie bradley_cooper kate_siegel paul_rudd \\\n",
"angelina_jolie 0.000000 1.344806 0.781201 1.425579 \n",
"bradley_cooper 1.344806 0.000000 1.256238 0.922126 \n",
"kate_siegel 0.781201 1.256238 0.000000 1.366423 \n",
"paul_rudd 1.425579 0.922126 1.366423 0.000000 \n",
"shea_whigham 1.448495 0.891145 1.416447 0.985438 \n",
"\n",
" shea_whigham \n",
"angelina_jolie 1.448495 \n",
"bradley_cooper 0.891145 \n",
"kate_siegel 1.416447 \n",
"paul_rudd 0.985438 \n",
"shea_whigham 0.000000 \n"
]
}
],
"source": [
"dists = [[(e1 - e2).norm().item() for e2 in embeddings] for e1 in embeddings]\n",
"print(pd.DataFrame(dists, columns=names, index=names))"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}