Object Detection using DETR

Posted By :Pradeep Farthyal |1st September 2020

DETR: Detection Transformer is a collection based global loss that requires individual predictions via bipartite matching, also a transformer encoder-decoder structure. Modern deep learning algorithms make multi-step object detection which points to the problem of false positives. DETR intends to explain this innovatively and efficiently.
It handles an object detection difficulty as a close set prediction problem with the guidance of an encoder-decoder structure based on transformers. By set, I expect the position of bounding boxes.

Transformer: It is a structure during transforming one set into another one with the guidance of two parts (Encoder and Decoder), but it differs from the previously reported sequence-to-sequence models because it does not indicate any Recurrent Network (GRU, LSTM, etc.).

It relies on a simple yet powerful mechanism called attention, which allows AI models to selectively focus on certain parts of their input and thus think more effectively.

DETR Pipeline:

Inference:

Calculate the Image feature from the backbone.
To Transfer encoder-decoder structure
Determining a set of predictions

Training:

Calculate Image features from the backbone
To transformer encoder-decoder architecture.

Advantage of the DETR Pipeline:

Easy to Use
No Custom Layers
An easy extension to other tasks
Prior information about anchors or handcrafted algorithms like NMS is not needed

DETER Architecture:

It Contains three Components

CNN backbone extract compact feature representation
encoder-decoder architecture
FFN (Feed Forward Network) that make final detection prediction

Backbone: Frequently utilized ResNet50 as the backbone of DETR. Ideally, any backbone can be used depending on the complexity of the task. It provides a low dimensional description of the image must a refined feature.

Encoder: The encoder layer has a fixed architecture and consists of a multi-head attention module and an FNN.

Deocder: It supports the conventional structure of the transformer, transforming N embedding of size d using multi-head self and encode decoder recognition mechanism.

The difference with the first transformer is DETR model decodes the N object in similarity at any decode layer.

FFN: It predicts the normalized core coordinates, length & breadth of the box w.r.t input image, and linear layer predicts class label using a softmax function.

Results:

for more details click here.

Request For Proposal

Get In Touch
[contact-form-7 404 "Not Found"]

Object Detection using DETR

Posted By :Pradeep Farthyal |1st September 2020

Ready to innovate ? Let's get in touch

Follow us

We are ISO 9001:2015 Certified

Valued Services

Expertise

Resources

Connect with us

Follow us

Object Detection using DETR

Posted By :Pradeep Farthyal |1st September 2020

About Author

Pradeep Farthyal

Ready to innovate ? Let's get in touch

Follow us

We are ISO 9001:2015 Certified

Valued Services

Expertise

Resources

Connect with us

Follow us