Kailai Sun*
Center for Intelligent and Networked Systems, Department of Automation, BNRist, Tsinghua University, Beijing, China
*Corresponding author:Kailai Sun, Center for Intelligent and Networked Systems, Department of Automation, BNRist, Tsinghua University, Beijing, China
Submission: January 27, 2023;Published: February 07, 2023
ISSN: 2639-0574 Volume5 Issue3
Buildings currently represent 36% of global energy consumption and 37% of global greenhouse gas emissions, according to U.N. data [1]. In reducing carbon emissions in buildings for achieving zero emissions by 2050, the human factor plays a crucial role. Many studies showed that approximately 10% to 40% of the energy consumption in buildings can be saved with occupancy information [2,3]. How to estimate accurate occupancy has increasingly become a hot topic. On the other hand, With the rapid development of artificial intelligence (AI) and computer vision, image/video analysis technologies have been widely applied in buildings. This mini review will present and reveal the advanced technology of vision-based building occupancy estimation. Finally, the challenges and future trends are provided.
In terms of captured visual information, vision-based building occupancy estimation methods can be divided mainly into scene-based counting and line-based counting. In terms of different installed camera locations, there are two visual situations: room interior and entrance.
(1) For scene-based counting methods (SCMs), cameras are usually installed inside the rooms. Captured videos are analyzed by AI and computer vision technologies. Most studies apply people detection algorithms to estimate indoor occupancy (e.g., YOLO [4]). They are mainly divided into body and head detection. Body detection methods extract hand-crafted or deep features for body recognition [5]. Considering complex indoor scenes, head detection has gradually become the mainstream SCMs because heads are more visible [6]. As for detectors, most studies apply general object detectors, but many studies propose specific detectors by considering occupants’ knowledge (e.g., head size [7] and head motion information [8]). However, many other objects are recognized as occupants (i.e., false positives) because of the complex environments; it is hard to deploy cameras to cover the entire room without occlusion. Besides, applying the single-frame detectors to estimate occupancy is unstable, thus many studies have adopted multi-frame methods to enhance features and remove irregular estimation results [9].
(2) For line-based counting methods (LCMs), cameras are usually installed at room entrances. Most studies detect and track occupants at entrances. They estimated occupancy in buildings by counting passing events. They segment the fore-ground area by background subtraction technologies, and track occupants’ moving by trackers (Kalman filter, Deep-sort, etc.), and then distinguish moving directions whether occupants arrive or leave doors [10]. Cameras at room en-trances are usually installed at different locations: the side or overhead views. At the side view, LCMs detect and track occupancy body [11]; at the overhead view, LCMs recognize head and shoulder parts [12]. However, errors may occur when many occupants simultaneously pass through room entrances [13]. Once an occupant is misrecognized, errors will accumulate until manually cleared.
(3) To mitigate the above limitations and improve the estimation performance, many studies have developed fusion methods [13-15]. They consider heterogeneous visual information by combining LCMs and SCMs to eliminate cumulative errors and irregular estimation results. Considering scene knowledge and the indoor number of occupants, they adjust or automatically switch LCMs and SCMs at the people level to obtain more fine-grained estimation results.
To achieve accurate occupancy estimation for building energysaving,
although existing methods have achieved remarkable
progress, they suffer from inherent limitations:
A. Complex indoor scenes, occlusion, and illumination
have severe influence on SCMs. Occupants are often occluded
by other objects (e.g., tables, chairs, computers). Moving
occupants cause significant variations in scale, pose, texture,
and illumination.
B. It is generally known that datasets are critical for AI, while
public building visual occupancy datasets are lacking.
C. Vision-based methods provide fine-grained information
but will cause leakage of the privacy problem.
D. Many studies apply AI neural networks, achieving stateof-
the-art (SOTA) performance. However, when people deploy
detectors and trackers in buildings, how to make AI more
reliable is a big challenge.
E. It is hard to clear cumulative errors only by LCMs.
To address the above challenges, future research can focus on
the following aspects:
a) Developing advanced sensor fusion technologies by
machine-learning algorithms for occupancy estimation.
b) Collecting and establishing multi-model occupancy
datasets in buildings.
c) Before practical deployment, the verification, testing,
adversarial attack, and defense of the deep neural network
become necessary.
d) Applying neural network compression technologies on AI
and IoT edge computing devices to enable smart buildings and
reduce the communication delay.
e) Considering federated learning to meet the requirements
of user privacy protection, and data security. In particular, each
edge collects data and trains local machine learning models, and
only uploads parameters to the server, which largely decreases
the risk of data privacy.
Occupancy information is important to building HVAC system control and energy-saving. This paper reviews recent vision-based occupancy estimation methods, including technical details and limitations. Challenges and future trends are presented, including datasets, edge computing, and federated learning.
© 2023 Kailai Sun. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and build upon your work non-commercially.