Reliability of a Wearable Motion System for Clinical Evaluation of Dynamic Lumbar Spine Function

Background: Low back pain is the leading cause of disability worldwide. Subjective assessments are often used to assess extent of functional limitations and treatment response. However, these measures have poor sensitivity and are influenced by the patient’s perception of their condition. Currently, there are no objective tools to effectively assess the extent of an individual’s functional disability and inform clinical decision-making. Objective: The purpose of this study was to evaluate the reliability of a wearable motion system based on Inertial Measurement Unit (IMU) sensors for use in quantifying low back function. Methods: Low back motion assessments were conducted by 3 novice raters on 20 participants using an IMU-based motion system. These assessments were conducted over 3 days with 2 days of rest in between tests. A total of 37 kinematic parameters were extracted from the low back motion assessment in all three anatomical planes. Intra-rater and inter-rater reliability were assessed using Intraclass Correlation Coefficients (ICCs) calculated from repeated measures, mixed-effects regression models. Results: Lumbar spine-specific kinematic parameters showed moderate to excellent reliability across all kinematic parameters. The ICC values ranged between 0.84–0.93 for intra-rater reliability and 0.66 – 0.83 for inter-rater reliability. In particular, velocity measures showed higher reliabilities than other kinematic variables. Conclusion: The IMU-based wearable motion system is a valid and reliable tool to objectively assess low back function. This study demonstrated that lumbar spine-specific kinematic metrics have the potential to provide good, repeatable metrics to assess clinical function over time.


Introduction
Among all Musculoskeletal Disorders (MSDs), Low Back Pain (LBP) is the leading cause of long-term disability and the most prevalent condition with a lifetime prevalence of 65-80% in USA adults [1]. LBP prevalence is on the rise, imposing a financial burden to employers and healthcare systems [2,3]. Identifying root causes of LBP has been a challenge for physicians and researchers, as LBP is associated with many different biopsychosocial factors [4]. Traditionally, practitioners have utilized subjective methods to quantify back pain, such as questionnaires or surveys [5][6][7][8]. While these methods may provide insights into perceptions of pain intensity, they do not represent objective measures that describe biomechanical or functional dimensions of LBP. Therefore, a simple and reliable clinical tool that holistically evaluates and provides an objective snapshot into the functional status of the lumbar spine may be helpful in stratifying LBP patients, monitoring treatment outcomes, and enhancing clinical decision-making to improve long-term outcomes. There have been several biomechanical tools developed to assess the functional state of a patient with LBP. Goniometers and digital inclinometers are frequently utilized for the assessment of lumbar range of motion [9][10][11][12][13]. The reliability of these goniometric measures assessed via Intraclass Correlation Coefficients (ICCs) range from 0.85-0.99, and >0.95 for digital inclinometry [11]. However, while these measurement tools provide reliable measures of flexibility, they only provide two-dimensional kinematic information under static conditions and do not necessarily capture the complex dynamics of functional human motion. Moreover, the sensitivity and specificity of lumbar range of motion to differentiate between LBP patients and healthy cohorts are both poor and questionable [14].
In contrast, Inertial Measurement Units (IMUs) are portable and cost-effective motion sensors that can capture three-dimensional dynamic movement data and have the potential to serve as an objective tool to evaluate the functional status of the spine [15]. The reliability and performance of IMUs compared to the gold standard optical motion capture system have been evaluated in prior literature with promising results. Beange et al. [16] recruited 10 subjects to perform 35 cycles of repetitive spine flexion-extension. The reliability of IMUs relative to an optical motion capture system for measuring features associated with spine movement and motor control showed strong ICC values ranging between 0.81-0.96. A similar study by Bauer et al. [17] also investigated the performance of an IMU system for use in the assessment of movement dysfunctions. In their study, the concurrent validity of the IMU system was compared to an optoelectronic system. Their results demonstrated that when compared to the optoelectronic system, the IMU system is valid for estimates of trunk movement in the primary movement direction. Finally, Kang et al.
[18] examined the validity of IMUs when measuring the mean postural angles for thorax and head flexion, and shoulder girdle elevation during gait in 7 healthy individuals. For assessment of the accuracy, Bland-Altman analysis and ICC values (>0.73) showed promising results confirming the agreement of IMUs with the motion capture system. With the need for objective measures and the utility of IMUs to provide useful reliable measures, recently, the authors developed a novel IMU-based lumbar spine wearable device ( Figure 1) for clinical use to provide a direct objective measure of functional spine health. While wearing the device, an individual performs a standardized 10-minute functional motion assessment.
The device automatically captures the individual's unique motion signature while they perform these series of standardized multi-planar, lumbar-specific functional motion tasks. The motion signature consists of three-dimensional (3D) lumbar range of motion (Room) as well as dynamic features such as velocities and accelerations. Collectively, these lumbarspecific motion signature data provide a window into the functioning of an individual's musculoskeletal system and can serve as indicators of functional health. However, the reliability of this IMU-based functional motion assessment needs to be further investigated to better inform clinical utility and acceptance. Therefore, the purpose of this study was to quantify the reliability of lumbar kinematic features captured within an integrated software platform that interacts with an IMU-based lumbar spine wearable device. Inter-and intrarater reliability were quantified with healthy controls to understand the total variance across various raters at different time points. The results of this study form a foundation for future work that will utilize the same or similar IMU-based platforms to quantify low back function.

Subjects
A total of 20 healthy participants were evaluated in this study, consisting of 10 males and 10 females aged 18-64 years. Twenty participants were chosen as the number needed to determine statistical significance as it has been established by previous studies [19].
Inclusion criteria: Participants that were eligible for this study include those who aged between 18 and 80 years old. Participants must have been able to stand for 25 minutes at a time and able to speak, read, and understand English so that instructions and safety precautions were understood.
Exclusion criteria: Our exclusion criteria were participants with back pain or history of chronic low back pain, history of any severe brain or spine pathologies, uncorrected vision impairment, uncorrected hearing impairment, vestibular conditions requiring medical treatment, bone fractures within the last 3 months, and chemotherapy or radiation therapy in the last 3 months.

Clinical lumbar motion assessment
Software and hardware components: A unique cloud-based software application was leveraged for all data collection, storage, review, filtering, sorting, and processing operations. The functional lumbar spine motion assessment was performed using commercially available IMU sensors (XSens MTw2) integrated into custom lightweight flexible harnesses. The wearable device was designed specifically for collecting threedimensional lumbar-specific kinematics non-invasively. The system was designed for safe human contact, with the sensors mounted on harnesses worn over clothing. All hardware and software components allow for the accurate recording of kinematic variables with built-in sensor drift correction algorithms [20].
Study design-A repeated measures experimental design using three raters was used to assess intra-rater and inter-rater reliability. The raters were inexperienced first-time users of the technology. As part of the study protocol, they were briefly trained on the use of the technology prior to subject enrollment and data collection. Each participant was evaluated independently by the three raters on three separate days (9 total data collection evaluations). A 30-minute rest period was provided to each participant between motion tests by rater on the same day. In addition, two days of rest were also required between each day to minimize the impact of learning and fatigue effects onparticipants. Rater order was counterbalanced across days and participants. Raters were blind to tests performed by other raters. A typical participant schedule is shown in Table 1.

Detailed procedures
Prior to enrollment, the study details were explained, and each participant signed a consent form. Sufficient time was given to participants to review their consent forms and ask additional questions. Upon enrollment into the study, each participant completed a series of questionnaires that captured information on their current pain, fatigue, and low back stiffness. On completion of these questionnaires, the participants then performed a 10minute functional motion assessment using a wearable IMU-based system for the motion assessment, lightweight harnesses (belt and vest) with mounted IMU sensors were placed on the upper torso and pelvis of participants as shown in Figure 1. Then, each participant was asked to perform a series of pre-defined low back motions in each of three anatomical planes (lateral, axial, and sagittal). The lateral motion trials had subjects lean their upper body to the left and right without moving their pelvis. The axial motion trials had subjects rotate their upper body to the left and right while avoiding pelvis movement. The sagittal motion trials had the subject flex their upper back forward and then return to the starting position. The software provided verbal instructions for each trial and asked participants to perform the motions as fast or as far as they could comfortably. The software also permitted practice for each motion task and provided advanced instructions if the tasks were not performed properly. The first three motion trials captured multi-planar range of motion (flexibility), and the latter five trials captured dynamic motions (see Supplemental Section for details on motion assessment). These motions were directly recorded to the custom software application. The participant completed the study after performing a total of 9 data collection sessions.

Statistical analysis
Low back motion signals were characterized into a series of features including range of motion and mean or peak velocities and mean or peak accelerations. Details on these features have been outlined and described in Section 10 in the appendix. Intraclass Correlation Coefficients (ICCs) were calculated to investigate intra-rater and inter-rater reliability of low back motion assessments using repeated measures, mixed-effects regression models. All ICCs [ICC (2,1) and ICC (3,1)] were calculated and reported in accordance with Koo and Li [21]. Based on the guidelines set by Koo and Li [21], ICC values less than 0.5 indicate poor reliability, values between 0.5 and 0.75 indicate moderate reliability, values between 0.75 and 0.9 indicate good reliability, and values above 0.9 indicate excellent reliability. All analyses were performed using SAS v9.4 and R. The type I error rate was alpha = 0.05.s Table 2 displays the baseline demographic characteristics of the subjects (participants). Subjects comprised of 10 males and 10 females with age ranges of 18-64 years. In total, 70% (n=14) were white, 25% (n=5) were Asian, and 5% (n=1) were unknown. In addition, the mean height and weight of these subjects were 171.5±9.6 cm (67.5±3.8 inches) and 69.8±11.9 kg (153.8±26.2 pounds), respectively.

Intra-rater reliability
Overall, we analyzed the reliability of 37 kinematic measures captured during functional motion. Table 3 displays the intra-rater ICC estimates and 95% confidence intervals as a function of each of the cardinal planes for the various motion tasks. The intra-rater reliability for mean flexibility (RoM) in the axial plane was moderate to excellent with a mean ICC=0.85 (95% CI: 0.56-0.96), whereas mean axial velocity and acceleration showed higher intra-rater reliabilities with ICC=0.89 (95% CI: 0.80-1.00) and ICC=0.87 (95% CI: 0.80-0.94), respectively. For the lateral plane, all motion metrics (flexibility, velocity, and acceleration) yielded reliability estimates ranging from good to excellent. More specifically, ICC estimates for mean lateral flexibility, velocity, and accelerations were 0.91 (95% CI: 0.84-0.94), 0.92 (95% CI: 0.86-0.96), and 0.87 (95% CI: 0.78-0.95), respectively. In addition, all intra-rater reliability values for the sagittal plane of the body resulting from the sagittal symmetric task, sagittal bending while twisted Clockwise (CW) task, and sagittal bending while twisted Counterclockwise (CCW) task yielded good to excellent results with mean ICC estimates ranging between 0.86-0.93 for flexibility, velocity, and acceleration metrics. Collectively, the intra-rater ICC estimates suggest good to excellent reliability for all metrics of velocity and acceleration as well as flexibility except in the axial plane of the body.

Inter-rater eliability
The inter-rater reliability findings (Table 4) resulted in a slightly different pattern compared with intra-rater reliability. That is, this study found that mean axial flexibility resulted in poor to good inter-rater reliability with ICC=0.69 (95% CI: 0.29-0.87), whereas mean axial velocity showed moderate to excellent reliability with ICC=0.77 (95% CI: 0.62-1.00). The mean axial acceleration metric also resulted in moderate to excellent reliability with ICC=0.73 (95% CI:0.60-1.00). With regards to the lateral plane, kinematic metrics produced inter-rater reliability estimates that were moderate to good with mean ICC estimates ranging between 0.60-0.80. For the sagittal symmetric tasks that yielded sagittal plane flexion and extension flexibility and velocity metrics, ICC values ranged between 0.76-0.80. The flexion and extension sagittal acceleration measures also yielded reliability in the moderate range (ICC values ranging 0.71-0.72), whereas mean sagittal acceleration yielded inter-rater reliability from moderate to excellent with ICC= 0.74 (95% CI: 0.60-1.00). For the sagittal asymmetric tasks that yielded sagittal velocity and acceleration metrics when flexing and extending while twisting in a counterclockwise position, the reliability metrics ranged from moderate too good for all velocity and acceleration measures (ICC values ranging 0.75-0.83). Finally, sagittal velocity and acceleration metrics when flexing and extending while twisting in a clockwise position resulted in good reliability for all velocity metrics and moderate reliability for all acceleration metrics. While performing the task twisted clockwise, extension sagittal velocity showed the best reliability with ICC=0.77 (95% CI: 0.63-0.86) in velocity metrics and mean sagittal acceleration showed the best reliability with ICC=0.75 (95% CI: 0.59-1.00) in acceleration metrics.

Reliability patterns among metrics
When considering the inter-rater and intra-rater results together, an interesting pattern in the ICC distributions was evident. More specifically, velocity measures were distributed towards higher ICCs for all tasks compared with both flexibility and acceleration metrics in all planes. (Figures 2-4) demonstrate ICCs of the various kinematic metrics relative to the inter-rater vs intra-rater reliability for the axial, lateral, and sagittal planes, respectively. These plots were intended to indicate the estimates of the ICC distributions so that one can appreciate the relative relationship among the various kinematic metrics. In these figures, poor, moderate, good, and excellent reliability regions are displayed in red, yellow, light green, and dark green colors, respectively, so that the relationships between the intra-rater and inter-rater ICC mean estimates could be considered collectively. Positional measures are shown with circles, measures related to velocity are shown with triangles, and measures related to acceleration are shown with squares. These figures indicate that the mean ICC estimates for all kinematic metrics were in the good or excellent range in all anatomical planes. There were two main trends that can be noted from these plots. First, the interrater and intra-rater ICCs within the sagittal and lateral planes of the body produced ICC estimates that were generally better than those for the axial plane of the body. Second, in all planes of the body, ICCs were better for the velocity kinematic metrics than for the other kinematic metrics in nearly every instance ( Figure 5).

Discussion
The current study evaluated the reliability of an IMU-based wearable motion system to provide quantitative kinematic metrics indicative of low back function. Our results showed the extent to which three-dimensional lumbar kinematics were reliable and can serve as useful functional indicators of spine health. Overall, the ICC values for the wearable motion system showed good to excellent intra-rater and moderate to good inter-rater reliabilities. Two dimensions of intra-rater and inter-rater reliability were assessed in this study ( Figure  6). The results of this study demonstrated that there are many spines kinematic metrics derivable from the various functional motion tasks that yield "moderate" to "excellent" and "moderate" to "good" reliability. Of particular note between the intra-rater and inter-rater test reliability results is the idea that the velocity-related metrics resulted in ICC values that were generally greater than other kinematic parameters like ROM or acceleration metrics. It is particularly noteworthy that the intra-rater reliability was particularly good in the lateral and sagittal planes. This information should be especially useful when deriving unique composite metrics of functional spine status that combine some of these metrics. It should also be noted that the wearable system used here was extremely accurate in terms of measuring spine motion. The system has been compared to a high-fidelity infra-red motion capture system and found to be within 98.1%, 99.5%, and 99.0% accuracy of the infra-red camera system when measuring position, velocity, and acceleration, respectively [20]. Thus, these reliability measures reflected the repeatability of the test due to variation between raters and variation by the subjects over time.
The ICC estimates in our study compared favorably (and in many cases slightly better) to other studies that have reported ICCs for the torso and other parts of the body. Marras et al. [14] investigated repeatability of trunk kinematic metrics using a wearable lumbar motion monitor in 20 subjects repeating the same tasks over the course of five weeks with one rater. Their study reported Cronbach's alpha correlation coefficients as a reliability measure for various kinematic measures in the sagittal and lateral planes of the body. As in the current study, Marras et al found very good repeatability for velocity and acceleration in the sagittal plane and moderate repeatability in the lateral plane of the body. Simmonds et al. [22] assessed test-retest reliability of physical performance using a series of trunk flexion movements with one rater and 44 low back patients and 48 healthy subjects. Day-to-day reliability (ICC) reported values ranged between 0.59-0.88 in the low back pain group and between 0.46-0.76 in the control group. The ICCs reported in our current study were generally better than these. Another study investigated trunk strength repeatability as opposed to kinematics examined in the current study. Rabelo et al.
[23] investigated inter-rater and test-retest reliability of trunk strength and endurance measures from an isokinetic dynamometer. In this study, 13 patients and 13 healthy subjects were recruited and performed a series of trunk flexion and extension movements using different movement protocols. The reported ICC values ranged between 0.59 -0.99. This further suggests that the vast majority of the kinematic metrics assessed via our IMU-based wearable system were generally more reliable, especially considering the lower bounds of the ICCs reported by Rabelo et al. The ICCs reported in our study were slightly higher for intra-rater compared to inter-rater reliability. This implies that there was a slightly higher degree of agreement between days of testing (time points) compared to the agreement between raters. In addition, variability of measures might also be due to the subjects varying day to day rather than the raters or testing method itself. Subjects did not report significant changes in stiffness or pain; thus, it is possible that uncontrolled external factors, which might include sleep, nutrition, exercise outside of the study, or psychosocial fluctuations, impacted the subjects' movement. It is also possible that a learning effect occurred between sessions (Table 5). More in-depth analyses suggested slightly worse agreement in metrics on the first day of testing compared to subsequent test days. We suspect that subjects became familiar with the motions after completing them multiple times. Despite these limiting factors, the findings of good to excellent ICCs for motion metrics within intra-rater reliability and moderate to good ICCs for motion metrics with inter-rater reliability indicates relative consistency between and within raters. Hence, overall, these results reflect a reliable set of kinematic metrics that objectively reflect low back function. There were several limitations to this study. As mentioned previously, some external factors were not measured that might have impacted motion assessments, such as sleep, nutrition, exercise, or psychosocial factors. Some of these effects might be mitigated through the study design and calculations of reliability because motion assessments administered on the same day would likely not exhibit large variations of these factors. Another potential limitation was that the motion signals were not explored for more complex features, outside of traditional kinematic measures, that could prove to be reliable metrics. However, even with these potential limitations, this study demonstrates that many of the metrics reported herein can serve as reliable indicators of low back function and has the potential to provide useful clinical information.

Conclusion
The lumbar spine measurement system discussed herein has demonstrated good reliability for both inter-rater and intra-rater reliability via ICCs. For each measure, the intra-rater reliability was slightly greater than the inter-rater reliability. The intra-rater reliability ranged from moderate to excellent among the various kinematic metrics with most of these measures indicating good to excellent reliability. Similarly, inter-rater reliability ranged from moderate to good with the exception of some of the axial flexibility plane metrics, which revealed poor to good repeatability. Both velocity and acceleration measures in the different anatomical planes had higher reliability scores than the flexibility measures. These reliability results suggest that these lumbar spine-specific kinematic metrics have a high reliability for evaluating low back function for different raters on the same day. The metrics also displayed some variability between days that were most likely a result of subtle subject changes between days. Given this information, adjustments to testing protocols might further improve the measurement (Table 6). However, overall, this study has demonstrated that kinematic metrics as measured by this IMU-based wearable device can provide very good, repeatable metrics to assess the functional low back status of individuals over time.  Inter-versus intra-rater reliability in the axial plane. Poor, moderate, good, and excellent reliability regions are displayed in red, yellow, light green, and dark green colors, respectively. Positional, velocity-related, and acceleration-related measures are shown with circles, triangles, and squares. Inter-versus intra-rater reliability in the lateral plane. Poor, moderate, good, and excellent reliability regions are displayed in red, yellow, light green, and dark green colors, respectively. Positional, velocity-related, and acceleration-related measures are shown with circles, triangles, and squares. Inter-versus intra-rater reliability in the sagittal plane. Poor, moderate, good, and excellent reliability regions are displayed in red, yellow, light green, and dark green colors, respectively. Positional, velocity-related, and acceleration-related measures are shown with circles, triangles, and squares.   Lateral motion data collected with device.    Table 5.

Term Interpretation
Axial movement (   The peak velocity as subject bends back to neutral from a flexed position in the sagittal plane while fully twisted CCW CCW Sagittal Flexion Velocity The peak velocity as the subject bends forward in the sagittal plane from a neutral position while fully twisted CCW CCW Sagittal Mean Velocity The mean peak velocity as subject bends forward and back to neutral in the sagittal plane while fully twisted CCW

CCW Sagittal Extension Acceleration
The peak acceleration a subject has while bending back to neutral in the sagittal plane when fully twisted CCW

CCW Sagittal Flexion Acceleration
The peak acceleration as the subject bends forward in the sagittal plane when fully twisted CCW

CCW Sagittal Mean Acceleration
The mean peak acceleration as the subject bends forward and back to neutral in the sagittal plane while fully twisted CCW Hani et al.

Page 23
Measure Description CW Sagittal Extension Velocity The peak velocity as the subject bends back to neutral in the sagittal plane while fully twisted CW CW Sagittal Flexion Velocity The peak velocity as the subject bends forward in the sagittal plane while fully twisted CW CW Sagittal Mean Velocity The peak velocity as the subject bends forward and back to neutral in the sagittal plane while fully twisted CW

CW Sagittal Extension Acceleration
The peak acceleration as the subject bends back to neutral in the sagittal plane while fully twisted CW

CW Sagittal Flexion Acceleration
The peak accelerationas the subject bends forward in the sagittal plane while fully twisted CW CW Sagittal Mean Acceleration The peak absolute (regardless of direction) acceleration as the subject bendsforward and back to neutral in the sagittal plane while fully twisted CW