Summary
The project was to detect each class (defined by us) on a magazine, then run a OCR on each specific box for each class and create a structured JSON containing every information across the entire magazine. It was a whole project through data annotation to delivery a Docker file, with both a web interface and a CLI API.
Skills Developed
Computer Vision
Object Detection
OCR
Docker
Web Interface Development
CLI API
Data Annotation
Main Challenges
The main challenge with this project was determining when a article started and ended, we had to make several adjustments using digital image processing. Additionally, handling the variability in magazine layouts and ensuring accurate text extraction from different content types presented significant technical hurdles.