๐Ÿ“ AI & Bigdata/Computer Vision

[CV] ์ด๋ฏธ์ง€ ์ธ์‹, Object Detection์ด๋ž€?

SOIT 2022. 7. 20. 14:57

๊ธฐ์กด์˜ ์ด๋ฏธ์ง€ ์ธ์‹

 

class: car, person

์—ฌ๊ธฐ ์ด ์‚ฌ์ง„์„ class๋กœ ๋ถ„๋ฅ˜ํ•œ๋‹ค๋ฉด ์‚ฌ๋žŒ๊ณผ ์ž๋™์ฐจ ์ด ๋‘๊ฐ€์ง€๋กœ ๋‹จ์ˆœํ•˜๊ฒŒ ๋ถ„๋ฅ˜๋ฅผ ํ•ด๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๊ธฐ์กด์— ์ด๋ฏธ์ง€๋ฅผ ๋ถ„๋ฅ˜ํ•œ๋‹ค๊ณ  ํ•˜๋ฉด ๊ฐ„๋‹จํ•˜๊ฒŒ ๊ทธ ๊ฐ์ฒด๊ฐ€ ๋ฌด์—‡์ธ์ง€ ์•Œ๋ ค์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค.

 

(multiple) Classification + Localization

object detection์€ ๊ฒน์ณ์ ธ ์žˆ๋Š” ๋Œ€์ƒ๋“ค๊ณผ ๋ฐฐ๊ฒฝ์œผ๋กœ ๊ตฌ์„ฑ๋œ ๋ณต์žกํ•œ  ์ด๋ฏธ์ง€๋ฅผ ๋ณด๊ณ , ๋‹จ์ˆœํžˆ ๋ถ„๋ฅ˜(classify)ํ•  ๋ฟ ์•„๋‹ˆ๋ผ ์„œ๋กœ์˜ ๊ด€๊ณ„๊นŒ์ง€ ๊ณ ๋ คํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

 

"๊ฐ์ฒด๊ฐ€ ๋ฌด์—‡์ธ์ง€"๋งŒ ๋ถ„๋ฅ˜ํ•˜๋Š”๊ฒŒ ์•„๋‹ˆ๋ผ "๊ฐ์ฒด๊ฐ€ ์–ด๋””์— ์žˆ๋Š”์ง€"๊นŒ์ง€ ์•Œ๋ ค์ฃผ๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

 


objec detection์˜ ๋ฐฉ๋ฒ• 3๊ฐ€์ง€

  • Classification, Object Detection, Segmentation

 

  1. Single Object
    • Classification: Image Category Recognition (e.g. ๊ฐ•์•„์ง€, ๊ณ ์–‘์ด)
    • Classification + Localization : Object Bounding Box Recognition
      • ๋”๋ณด๊ธฐ
        ๊ฐ์ฒด๋ฅผ ๋ถ„๋ฅ˜ํ•˜๊ณ , bounding box๋กœ Regression์„ ํ†ตํ•ด ๊ฐ์ฒด์˜ ์ขŒํ‘œ๊ฐ€ ์–ด๋””์— ์žˆ๋Š”์ง€ ์ฐพ์•„ ๊ฐ์ฒด๊ฐ€ ์–ด๋””์— ์žˆ๋Š”์ง€๋‚˜ํƒ€๋‚ด ์ค๋‹ˆ๋‹ค.์ด๋•Œ ํ•œ ๊ฐœ์˜ ๊ฐ์ฒด์—๋งŒ ์ ์šฉ๋œ๋‹ค๋Š” ์ ์ด Object Detection๊ณผ ์ฐจ์ด์ 
  2. Multiple Object
    • Object Detection: Classification + Localization
    • Segmentation: Pixel Category Recognition
      • semantic segmentation: ๊ฐ ํ•ด๋‹นํ•˜๋Š” pixel์„ ๊ฐ€์ง€๊ณ  class๋กœ ๋ถ„๋ฅ˜ํ•˜๋Š” ๋ฐฉ๋ฒ•
      • instance segmentation: instance๊นŒ์ง€ ๊ตฌ๋ถ„ํ•˜๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค. ์ฆ‰, edge ์œค๊ณฝ์„ ์„ ๋ณด๊ณ  ๋™์ผํ•œ ๊ฐ์ฒด์ด์ง€๋งŒ (๊ณ ์–‘์ด 1, 2,..) ์ด๋Ÿฐ์‹์œผ๋กœ instance๋ฅผ ๊ตฌ๋ถ„ํ•ด ์ค๋‹ˆ๋‹ค.
๋”๋ณด๊ธฐ

Object Detection: ๊ธฐ์กด Classification ์— multiple ๊ฐ์ฒด๋ฅผ ์ธ์ง€ํ•˜๊ณ , ์ด์— ๋Œ€ํ•œ ์œ„์น˜๊นŒ์ง€ ์•Œ ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค.


objec detection

  • Multi-Labeled Classification + Bounding Box Regression(Localization)
    • ์—ฌ๋Ÿฌ๊ฐœ์˜ ๊ฐ์ฒด์— label์„ ๋ถ™์—ฌ Classification, ๊ฐ์ฒด๊ฐ€ ๋ฌด์—‡์ธ์ง€ ํŒ๋‹จ
    • Regression์„ ํ†ตํ•œ Bounding Box์˜ ์ขŒํ‘œ๋ฅผ ํ†ตํ•ด ๊ฐ์ฒด๊ฐ€ ์–ด๋””์— ์žˆ๋Š”์ง€ Localization ํŒ๋‹จ
A Cat at (x, y, w, h)

์ด๋ฏธ์ง€ ์ธ์‹์˜ ์‹œ์ดˆ: Classification

1. CNN(convolution neural netwotk)

  • ~2012, classification
CNN

2012๋…„๊นŒ์ง€๋งŒ ํ•ด๋„ ์ด๋ฏธ์ง€ ๋ถ„๋ฅ˜ ๋ถ„์•ผ์— ์žˆ์–ด์„œ ๋‹น์—ฐํ•˜๊ฒŒ ์‚ฌ์šฉ๋˜๋Š” ๊ฒƒ์€ CNN์ด์—ˆ์Šต๋‹ˆ๋‹ค.

CNN ์ด๋ฏธ์ง€ ๋ถ„๋ฅ˜ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๊ฐ„๋‹จํ•˜๊ฒŒ ๋ณด๋ฉด, ์ด๋ฏธ์ง€๋ฅผ ์ž…๋ ฅ ๋ฐ›์€ ํ›„, convolution layer ๋™์ž‘์„ ๋ณด๋ฉด ์ปค๋„์ด ์ด๋ฏธ์ง€๋ฅผ ์ง€๋‚˜๊ฐ€๋ฉด์„œ ์—ฐ์‚ฐ์„ ์ˆ˜ํ–‰ํ•˜๊ณ , ์ด๋ฅผ ํ†ตํ•ด, feature map์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. ์ฆ‰, feature๋ฅผ ์ถ”์ถœํ•˜๊ณ  fully connted layer์—์„œ ์ด๋ฏธ์ง€๋ฅผ ๋ถ„๋ฅ˜ํ•œ๋‹ค๊ณ  ์ƒ๊ฐํ•˜์‹œ๋ฉด ๋ฉ๋‹ˆ๋‹ค.

 

 

2. CNN์˜ Object Detection ์ ‘๊ทผ

  1. Classification: class(๊ณ ์–‘์ด: 0.9, ๊ฐ•์•„์ง€:0.1, …)
  2. multi-Localization(Regression): Region of Interest์˜ object์˜ ์œ„์น˜
    • Regression: object์˜ bounding box์˜ ์ขŒํ‘œ ์˜ˆ์ธก
    • Bounding Box๋ฅผ ๋‹ค์–‘ํ•œ Object ์ข…๋ฅ˜์— ๋Œ€ํ•˜์—ฌ ์ฐพ์•„์ค˜์•ผ ํ•จ

๊ทธ๋Ÿฐ๋ฐ ์ด ๋ฐฉ๋ฒ•์€ Object Detection ๋ถ„์•ผ์— ๋ฐ”๋กœ ์ ์šฉ๋˜์ง€๋Š” ๋ชปํ–ˆ๋Š”๋ฐ์š”.

์™œ๋ƒํ•˜๋ฉด ํ•˜๋‚˜์˜ ์ด๋ฏธ์ง€์—์„œ ํ™•๋ฅ ์ ์œผ๋กœ ๊ณ ์–‘์ด, ๊ฐ•์•„์ง€, ์ด๋ ‡๊ฒŒ ๋ถ„๋ฅ˜๋ฅผ ํ•ด ์ฃผ๋Š” ๊ฒƒ์ด์ง€, ๋ช‡๊ฐœ์˜ object ๊ฐ€ ์žˆ๋Š”์ง€ ํ™•์‹คํ•˜์ง€ ์•Š์€ ์ƒํƒœ์—์„œ๋Š” regression์„ ์ ์šฉํ•  ์ˆ˜ ์—†๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ์ด๋ฏธ์ง€ ๋‚ด์— Region of Interest์— object์˜ ์œ„์น˜๋ฅผ ์•Œ๋ ค์ฃผ๊ธฐ ์œ„ํ•œ Bounding Box๋ฅผ ๊ทธ๋ ค์ค˜์•ผ ํ•˜๊ณ , ๋‹ค์ˆ˜์˜ Bounding Box๋ฅผ ๋‹ค์–‘ํ•œ Object ์ข…๋ฅ˜์— ๋Œ€ํ•˜์—ฌ ์ฐพ์•„์ค˜์•ผ ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ด๋ฏธ์ง€ ๋ถ„๋ฅ˜ ๋ณด๋‹ค๋Š” ํ›จ์”ฌ ๋ณต์žกํ•œ ๋ฌธ์ œ์ž…๋‹ˆ๋‹ค.)

 

๋”ฐ๋ผ์„œ ์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด object๊ฐ€ ์žˆ์„๋งŒํ•œ ์˜์—ญ์„ ๋ณด๋„๋ก ํ•˜๋Š” ๋ฐฉ๋ฒ•์ธ Region proposal ๋ฐฉ๋ฒ•์ด ์ œ์‹œ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.


์ด๋ฏธ์ง€ ์ธ์‹ ๋ฐฉ๋ฒ•: Region Proposal

๋Œ€ํ‘œ์ ์ธ ๋ฐฉ๋ฒ• 2๊ฐ€์ง€
  • Sliding Window: ๋ชจ๋“  object์— ๋™์ผํ•œ ํฌ๊ธฐ(๋™์ผํ•œ ์œˆ๋„์šฐ ํฌ๊ธฐ, ๋น„์œจ)๋กœ Object ์ถ”์ถœ
  • Selective Search: ์œ ์‚ฌํ•œ (Color, Texture, Enclosed, ..) Pixel ๊ณ ๋ คํ•ด Object ์œ„์น˜ ํŒŒ์•…
  • greedy ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ด์šฉํ•˜์—ฌ Region์„ ๊ณ„์ธต์  ๊ทธ๋ฃนํ•‘ ๋ฐฉ๋ฒ•์œผ๋กœ ๊ณ„์‚ฐ

window sliding,   seletiv  search

์ด object detection์„ ํ•˜๊ธฐ ์œ„ํ•ด Region proposal ์„ ํ•˜๋Š” ๋Œ€ํ‘œ์ ์ธ ๋ฐฉ๋ฒ•์€ window sliding, seletiv search ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ์žˆ์Šต๋‹ˆ๋‹ค.

window sliding ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ์ด๊ฒƒ์€ CNN์˜ convolution layer์—์„œ ์ปค๋„์ด ์ด๋ฏธ์ง€๋ฅผ ์ง€๋‚˜๊ฐ€๋Š” ๊ฒƒ์„ ์ƒ๊ฐํ•ด ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋ง ๊ทธ๋ž˜๋กœ ๋™์ผํ•œ ํฌ๊ธฐ์˜ ,window๋ฅผ ์ง€๋‚˜๊ฐ€๋ฉด์„œ object๊ฐ€ ์žˆ์„ ๋ฒ•ํ•œ ์˜์—ญ์„ ์ถ”์ถœํ•ด ๋‚ด๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. (๊ธฐ์กด์˜ CNN์—์„œ ์ด๋ฏธ์ง€๋ฅผ ์ธ์‹ํ•˜๋Š” ๋ฐฉ๋ฒ•์ด๊ธฐ๋„ ํ•ฉ๋‹ˆ๋‹ค.)

๋‹ค์Œ์œผ๋กœ selective search ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ๋‚˜์™”์Šต๋‹ˆ๋‹ค.

์ด ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ object detection์—์„œ ์ฒ˜์Œ์œผ๋กœ CNN์„ ์‚ฌ์šฉํ•œ R-CNN์˜ ๋ฐฉ๋ฒ•์œผ๋กœ ์‚ฌ์šฉ๋˜๊ธฐ๋„ ํ•˜์˜€์Šต๋‹ˆ๋‹ค

bounding box๋ฅผ ๊ทธ๋ฆด๋•Œ ์œ ์‚ฌํ•œ ํ”ฝ์…€์„ ๊ณ ๋ คํ•ด ์ถ”์ถœํ•˜๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค. ๊ธฐ์กด์˜ window sliding๊ณผ ๋‹ค๋ฅด๊ฒŒ ์œ ์‚ฌํ•œ ํ”ฝ์…€๋ผ๋ฆฌ ๊ทธ๋ฃนํ•‘ํ•˜๊ณ  ๊ทธ ๋‹ค์Œ ํ”ฝ์…€๋ณด๋‹ค ๋” ํฌ๊ฒŒ mergeํ•˜์—ฌ Region proposal์„ ํ•ด์ฃผ๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค.


bounding boxes ์ถ”์ถœ

์—ฌ๋Ÿฌ ๊ฐœ์˜ bounding boxes๊ฐ€ ์ƒ์„ฑ๋  ๊ฒฝ์šฐ ์ฃผ๋กœ ์‚ฌ์šฉ๋˜๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜

  • NMS(Non-Maximum Supression)
    • object detector์ด ์˜ˆ์ธกํ•œ ์—ฌ๋Ÿฌ ๊ฐœ์˜ bounding box  -> ์ค‘๋ณต์ œ๊ฑฐ
    • IoU(Intersection-over-Union)๊ฐ€ ํŠน์ • threshold ์ด์ƒ
    • ๊ฒน์น˜๋Š” confidence box ์ค‘์—์„œ ๊ฐ€์žฅ ๋†’์€ ๊ฒƒ๋งŒ ๋‚จ๊ธฐ๋Š” ์ž‘์—…    

NMS

 

๊ทธ๋ ‡๊ฒŒ object๊ฐ€ ์žˆ์„๋ฒ•ํ•œ ์˜์—ญ์„ ๋‹ค ์ถ”์ถœํ–ˆ๋‹ค๋ฉด, ๋งˆ์ง€๋ง‰์œผ๋กœ object detection ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ์—ฌ๋Ÿฌ ๊ฐœ์˜ bounding boxes๋ฅผ ์ƒ์„ฑํ•ด ์ค๋‹ˆ๋‹ค. 

์ด bounding boxes ์ถ•์†Œํ•˜์—ฌ ์—ฐ์‚ฐ๋Ÿ‰์„ ์ค„์ด๊ณ , mAP(์„ฑ๋Šฅ์ง€ํ‘œ)๋„ ์˜ฌ๋ฆฌ๊ธฐ ์œ„ํ•ด์„œ NMS ๊ธฐ๋ฒ•์„ ์ ์šฉํ•ด object detection์„ ์ˆ˜ํ–‰ํ•ด ์ค๋‹ˆ๋‹ค. ๊ฐ„๋‹จํ•˜๊ฒŒ ๊ทธ๋ฆผ์„ ํ†ตํ•ด ๋ณด๋ฉด object detector์ด ์˜ˆ์ธกํ•œ bounding box ์ค‘์—์„œ ์ •ํ™•ํ•œ bounding box๋ฅผ ์„ ํƒํ•ด์•ผํ•ฉ๋‹ˆ๋‹ค. ์ด๋•Œ ์ ์šฉํ•˜๋Š” ๊ธฐ๋ฒ•์ด๋ผ๊ณ  ๋ณด์‹œ๋ฉด ๋ฉ๋‹ˆ๋‹ค.

NMS๋Š” IoU๊ฐ€ ํŠน์ • threshold์ด์ƒ์—์„œ ์ˆ˜ํ–‰ํ•˜๊ฒŒ ๋˜๋Š”๋ฐ,  IoU๋ž€ ์ „์ฒด ๋ฐ•์Šค ์˜์—ญ ์ค‘ ๊ฒน์น˜๋Š” ๋ถ€๋ถ„์˜ ๋น„์œจ์„ ๋งํ•ฉ๋‹ˆ๋‹ค. (IoU: ๊ฐ„๋‹จํ•˜๊ฒŒ ํ•ฉ์ง‘ํ•ฉ ์ค‘ ๊ต์ง‘ํ•ฉ์ด๋ผ๊ณ  ์ƒ๊ฐ)

NMS ๊ณผ์ • code

1. bounding box์˜ confidence score ๋‚ด๋ฆผ์ฐจ์ˆœ ์ •๋ ฌํ•œ ํ›„์— (line.2)

2. ๊ฐ€์žฅ confidence score๊ฐ€ ๋†’์€ bounding box์™€ IoU๊ฐ€ ์ผ์ • ์ด์ƒ์ธ boundingbox๋Š” ๋™์ผํ•œ ๋ฌผ์ฒด๋ฅผ detectํ–ˆ๋‹ค๊ณ  ํŒ๋‹จํ•˜์—ฌ ์ง€์šด๋‹ค.

3. ๊ทธ๋ ‡๊ฒŒ ์ตœ์ข…์ ์œผ๋กœ object๊ฐ€ ์žˆ์„๋ฒ•ํ•œ ์˜์—ญ์˜ bounding box๋ฅผ ์ถ”์ถœํ•ด ๋ƒ…๋‹ˆ๋‹ค.

 


Object Detection์˜ ๋ฐœ์ „๊ณผ์ •

  • 2012๋…„ AlexNet์˜ ๋“ฑ์žฅ๋ถ€ํ„ฐ object detection์—๋Š” ๋งŽ์€ ๋ชจ๋ธ๋“ค์ด ๋“ฑ์žฅํ•˜๊ฒŒ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
  • ํฌ๊ฒŒ classification๊ณผ regression์„ ์ˆœ์ฐจ์ ์œผ๋กœ ์ˆ˜ํ–‰ํ•˜๋Š” 2stage์˜ CNN ๊ณ„์—ด๊ณผ ์ด๋ฅผ ๋™์‹œ์— ์ˆ˜ํ–‰ํ•˜๋Š” 1stage์˜ Yolo, SSD ๊ณ„์—ด๋กœ ๋‚˜๋ˆŒ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

 

์ด ์™ธ์—๋„ object detection์— ์‚ฌ์šฉ๋˜๋Š” ๊ธฐ๋ฒ•๋“ค์€ object detection์˜ ๋ฐœ์ „ ๊ณผ์ •์„ ํ†ตํ•ด์„œ ๋” ์‚ดํŽด ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

 

 

๋‹ค์Œ ํฌ์Šคํ„ฐ๋ถ€ํ„ฐ ๋ณธ๊ฒฉ์ ์œผ๋กœ R-CNN์˜ ๋…ผ๋ฌธ ๋ฐ ์š”์•ฝ์œผ๋กœ Object Detection๋ฅผ ์‹œ์ž‘ํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค! ๐Ÿ˜Š

728x90