Safety Situation Detection with Grounding DINO

This project presents an AI-powered safety alert system that leverages the Grounding DINO model for context-aware object detection in CCTV footage. Unlike traditional detectors, the system operates based on “situation text prompts,” allowing it to dynamically identify potential risks described in natural language (e.g., “a person lying on the ground” or “a fire in a building”). By aligning visual inputs with semantic prompts, the model can detect diverse and previously unseen danger scenarios without requiring task-specific retraining. Detected events are processed in real time and delivered to users through a mobile application, enabling rapid response and improved situational awareness. This approach highlights the effectiveness of combining vision-language models with prompt-based detection for flexible and scalable surveillance systems.