Often, examples you see around computer vision and deep learning is about classification. Those class of problems are asking what do you see in the image? Object detection is another class of problems that ask where in the image do you see it?
Classification answers what and Object Detection answers where. Object detection has been making great advancement in recent years. The hello world of object detection would be using HOG features combined with a classifier like SVM and using sliding windows to make predictions at different patches of the image. This complex pipeline has a major drawback!
Speed becomes a major concern when we are thinking of running these models on the edge IoT, mobile, cars. For example, a car needs to detect where other cars, people and bikes are to name a few; I could go on… puppies, kittens… you get the idea.
The major motivation for me is the need for speed given the constraints that edge computes have; we need compact models that can make quick predictions and are energy efficient. The latest in object detection is to use a convolutional neural network CNN that outputs a regression to predict the bounding boxes.
This post is about SqueezeDet. I got interested because they used one of my favorite cnn, SqueezeNet! Inspired by YOLO, SqueezeDet is a single stage detection pipeline that does region proposal and classification by one single network.
The cnn first extracts feature maps from the input image and feeds it to the ConvDet layer. ConvDet takes the feature maps, overlays them with a WxH grid and at each cell computes K pre-computed bounding boxes called anchors. Each bounding box has the following:. The final step is to use non max suppression aka NMS to filter the bounding boxes to make the final predictions.Deep Learning 1: Google Colaboratory and cloning GitHub repository
The networks regresses and learns how to transform the highest probably bounding box for the prediction. Since there are bounding boxes being generated at each cells of the grid, the top N bounding boxes sorted by the confidence score are kept as the predictions. The figure above is the four part loss function that makes this entire model possible.A little while ago, I came across this very interesting project where the author of the article uses a webcam to play the classic fighting game named Mortal Kombat.
He utilizes a combination of Convolutional Neural Net and Recurrent Neural Net to identify the actions of kicking and punching from his webcam recording. Very cool way to play the game, indeed! Using this an as inspiration, I created a similar controller interface that can play first-person shooter games using the predictions of a Tensorflow object detection model. The code for this project can be found on my Github pageand is also linked below.
This controller is designed to handle the following actions in the game Aiming the gun. First, in order to look around in the game, I am using object detection on a tennis ball. Based on the location of the ball detected on the screen, we can set the position of the mouse, which in turn controls where our player aims in the game. Moving the player. Next, to instruct the player to move forward in the game, I am using detection of my index finger. When the finger points up, the player moves forward and putting the finger down again stops the movement of the player.
Shooting the gun. And the third action supported here is shooting of the gun. Since both the hands are used up in aiming and moving, I am using the mouth open gesture to control shooting of the gun. It has been trained on various images of tennis balls, raised fingers and that of teeth indicating an open mouth. It manages to run at a reasonable rate making it possible to use the lightweight model in real time to control our games.
In terms of performance of the model, the detection of the finger and teeth seems fairly reliable while playing the game. The main trouble is getting to aim the gun exactly where we want to since the model runs at much lower frame-rate than the game and hence the movement of the mouse is jumpy and not very smooth.
Also, detection of the ball towards edges of the image is poor making it unreliable. This issue could be addressed by tweaking the model to reliably detect objects a bit farther away from the webcam, so we have enough space to move the tennis ball and thereby have better control of our aim. The results of the in-game performance of this model can be found on my YouTube channelwith the video embedded below.
I feel the overall experience of controlling games with just the webcam and no extra hardware remains a very enticing concept. It has become very much possible thanks to advances in Deep Learning models.
Practical implementation of this control mechanism needs to be perfect in order to replace the more conventional ways of playing these games. I can see a polished implementation of this idea be a fun way to play FPS games. Thank you for reading! Sign in. Using Tensorflow Object Detection to control first-person shooter games.In order to fully integrate deep learning into robotics, it is important that deep learning systems can reliably estimate the uncertainty in their predictions.
Deep learning systems, e. Current approaches towards uncertainty estimation for deep learning are calibration techniques, or Bayesian deep learning with approximations such as Monte Carlo Dropout or ensemble methods. Our work focusses on Bayesian Deep Learning approaches for the specific use case of object detection on a robot in open-set conditions.
We introduce Probabilistic Object Detection, the task of detecting objects in images and accurately quantifying the spatial and semantic uncertainties of the detections. Given the lack of methods capable of assessing such probabilistic object detections, we present the new Probability-based Detection Quality measure PDQ. Nature Machine Intelligence, To safely operate in the real world, robots need to evaluate how confident they are about what they see. A new competition challenges computer vision algorithms to not just detect and localize objects, but also report how certain they are.
To this end, we introduce Probabilistic Object Detection, the task of detecting objects in images and accurately quantifying the spatial and semantic uncertainties of the detections. This paper provides the first benchmark for sampling-based probabilistic object detectors. A probabilistic object detector expresses uncertainty for all detections that reliably indicates object localisation and classification performance. In Proc. There has been a recent emergence of sampling-based techniques for estimating epistemic uncertainty in deep neural networks.
While these methods can be applied to classification or semantic segmentation tasks by simply averaging samples, this is not the case for object detection, where detection sample bounding boxes must be accurately associated and merged.
A weak merging strategy can significantly degrade the performance of the detector and yield an unreliable uncertainty measure. This paper provides the first in-depth investigation of the effect of different association and merging strategies.
We compare different combinations of three spatial and two semantic affinity measures with four clustering methods for MC Dropout with a Single Shot Multi-Box Detector. Our results show that the correct choice of affinity-clustering combinations can greatly improve the effectiveness of the classification and spatial uncertainty estimation and the resulting object detection performance.
We base our evaluation on a new mix of datasets that emulate near open-set conditions semantically similar unknown classesdistant open-set conditions semantically dissimilar unknown classes and the common closed-set conditions only known classes. Did You Miss the Sign? In this paper, we propose an approach to identify traffic signs that have been mistakenly discarded by the object detector. The proposed method raises an alarm when it discovers a failure by the object detector to detect a traffic sign.
This approach can be useful to evaluate the performance of the detector during the deployment phase. We trained a single shot multi-box object detector to detect traffic signs and used its internal features to train a separate false negative detector FND.
During deployment, FND decides whether the traffic sign detector has missed a sign or not. Dropout Variational Inference, or Dropout Sampling, has been recently proposed as an approximation technique for Bayesian Deep Learning and evaluated for image classification and regression tasks.GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.
If nothing happens, download GitHub Desktop and try again. If nothing happens, download Xcode and try again. If nothing happens, download the GitHub extension for Visual Studio and try again. A paper list of object detection using deep learning. I worte this page with reference to this survey paper and searching and searching. The part highlighted with red characters means papers that i think "must-read".
However, it is my personal opinion and other papers are important too, so I recommend to read them if you have time. FPS Speed index is related to the hardware spec e. The solution is to measure the performance of all models on hardware with equivalent specifications, but it is very difficult and time consuming.
Automatic adaptation of object detectors to new domains using self-training [CVPR' 19] [pdf]. What Object Should I Use? Object detection with location-aware deformable convolution and backward attention filtering [CVPR' 19]. Statistics of commonly used object detection datasets. The Figure came from this survey paper.
Skip to content. Dismiss Join GitHub today GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together. Sign up. No description, website, or topics provided. Branch: master. Find file. Sign in Sign up. Go back. Launching Xcode If nothing happens, download Xcode and try again.
Latest commit Fetching latest commit…. Table of Contents Paper list from to now Performance table Papers Dataset Papers Paper list from to now The part highlighted with red characters means papers that i think "must-read". Performance table FPS Speed index is related to the hardware spec e.
The papers related to datasets used mainly in Object Detection are as follows.We present a method for performing hierarchical object detection in images guided by a deep reinforcement learning agent. The key idea is to focus on those parts of the image that contain richer information and zoom on them. We train an intelligent agent that, given an image window, is capable of deciding where to focus the attention among five different predefined region candidates smaller windows.
Using Tensorflow Object Detection to control first-person shooter games
This procedure is iterated providing a hierarchical image analysis. We compare two different candidate proposal strategies to guide the object search: with and without overlap. Moreover, our work compares two different strategies to extract features from a convolutional neural network for each region proposal: a first one that computes new feature maps for each region proposal, and a second one that computes the feature maps for the whole image to later generate crops for each region proposal.
Experiments indicate better results for the overlapping candidate proposal strategy and a loss of performance for the cropped image features due to the loss of spatial resolution. We argue that, while this loss seems unavoidable when working with large amounts of object candidates, the much more reduced amount of region proposals generated by our reinforcement learning agent allows considering to extract features for each location without sharing convolutional computation among regions.
This python code enables to both train and test each of the two models proposed in the paper. The image zooms model extracts features for each region visited, whereas the pool45 crops model extracts features just once and then ROI-pools features for each subregion.
In this section we are going to describe how to use the code. The code uses Keras framework library. If you are using a virtual environment, you can use the requirements. First it is important to notice that this code is already an extension of the code used for the paper.
During the training stage, we are not only considering one object per image, we are also training for other objects by covering the already found objects with the mean of VGG, inspired by what Caicedo et al. If you want to use some pre-trained models for the Deep Q-network, they can be downloaded in the following link Image Zooms model. Notice that these models could lead to different results compared to the ones provided in the paper, due that these models are already trained to find more than one instance of planes in the image.
We will follow as example how to train the Image Zooms model, that is the one that achieves better results. The instructions are equal for training the Pool45 Crops model. The default paths are the following:. The training of the models enables checkpointing, so you should indicate which epoch you are going to train when running the script. If you are training it from scratch, then the training command should be:.
We have trained it for planes, and all the experiments of the paper are run on this class, but you can test other categories of pascal, also changing appropiately the training databases.Our sophisticated technology connects to your current security camera to proactively detect and help prevent crimes before they happen.
Always be informed with live alerts. Our neural networks are developed from scratch, by PhDs in the field of deep learning and computer vision. Once deployed, Athena is always protecting you, helping to detect threats in your environment. Once a threat is detected, Athena sends a real-time video feed to your security staff, administrator or whomever you designate. At the same time, Athena alerts the criminal that they have been identified and the authorities have been called.
If configured to do so, Athena will then call the police and send them a video feed through the e lines and other pertinent information they need to help prevent the crime from happening. Athena acts as your virtual WatchGuard, watching many cameras at once and alerting the appropriate parties when a crime is about to be committed.
Home Products. Fever Detection System. Security Offerings. Gun Detection. Partners, Customers, Integrators, Technologies used. Upgrade Your Security. How it works Contact us. Key Features. Unique Core Technology. Athena Security provides never before used proactive threat detections such as gun detection before an active shooter uses a weapon, knife detection before a villain tries to hurt someone. Integrated With.
How it Works. Call Today.GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.
If nothing happens, download GitHub Desktop and try again. If nothing happens, download Xcode and try again. If nothing happens, download the GitHub extension for Visual Studio and try again. Collection of papers, datasets, code and other resources for object detection and tracking using deep learning.
Deep Learning Applications
Skip to content. Dismiss Join GitHub today GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together. Sign up. Collection of papers, datasets, code and other resources for object tracking and detection using deep learning.
Hierarchical Object Detection with Deep Reinforcement Learning
Branch: master. Find file. Sign in Sign up. Go back. Launching Xcode If nothing happens, download Xcode and try again. Latest commit. Latest commit d22 Apr 14, You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. May 26, Jan 7, Jan 9, Dec 28, Feb 29, Jan 25, Feb 18, Apr 14, Dec 14, Dec 19, Nov 19,