One Minute Press Up Test Reliability Essay

What do grade schoolers, law enforcement personnel and soldiers have in common? They're all subject to regular tests to measure their fitness levels. Push-up tests are often used to determine upper-body muscle endurance or stamina. While fit men might excel at this exercise, it can be a challenge for certain populations.

Video of the Day

Dozens of push-up tests exist. Some are timed to see how many you can do in 60 seconds or up to 2 minutes. The Army fitness test measures how many you can do in 1 minute, for example.

Others are set to a tempo, so that you push up in time to a beeping metronome. The President's Challenge Fitness Awards, administered in many public schools, is an example of this type of test. Once you can't keep up with the rhythm — usually 3-second intervals — you've maxed out the test.

Other tests, such as the YMCA Fitness Test, ask you simply complete as many push-ups as you can in a row without stopping. No set tempo or time limit is applied.

Some tests allow for women or girls to put their knees down during the push-up. In general, women have less upper-body muscle mass than men, and this modification is designed to compensate for the difference. A push-up done on the knees can help less fit women achieve higher scores, but it could make the test just too easy for extremely fit women.

Read More:Push-Ups and Muscular Endurance Tests

In military, law enforcement and trainer tests, the push-ups are accurately counted, for the most part. But in school-age children, the test administrator is usually a peer. In these cases, any effort to do a push-up — not just the ones that are done to the required 90-degree elbow angle or lower — may be counted, skewing a child's results.

The push-up test itself may be administered fairly and properly, but the results sometimes give you a distorted picture of your fitness. Usually, you rate from "poor" to "excellent," as per a chart comparing your push-up performance to that of other people of your same gender and age.

However, you'll only get an accurate understanding of your fitness if you use a chart that is appropriate for your standing. For example, a military professional has slightly higher standards than those set for general fitness. Compare yourself to an appropriate population to get meaningful information.

In a fitness setting, you hope a push-up test will demonstrate improvements from test to test — provided you've been training. For a push-up test to give an accurate reading, the conditions for the test must be identical each time it's performed. The room temperature, time of day, prior night's sleep of the subject and pre-test fueling all matter, even in something that seems as simple as banging out a few push-ups.

Some people get performance anxiety in advance of tests, even a push-up test. This anxiety can skew their results and make them fail prematurely because of mental self-doubt. A 2009 study published in Work noted that people who fear failure on their job-related or scholastic physical fitness exams usually perform poorly because they have trouble focusing on the task.

If a push-up test is administered by a trainer to a new client, for example, it can create barriers to communication. Push-up tests in grade school present problems with peer judgement and possible teasing for kids who don't perform up to standard.

Read More: What is the Average Number of Push-Ups for a Man Over 50?

Lose Weight. Feel Great! Change your life with MyPlate by LIVESTRONG.COM

ABSTRACT

The purpose of this study was to assess inter-rater reliability and intra-rater reliability of the 2-minute, 90° push-up test as utilized in the Army Physical Fitness Test. Analysis of rater assessment reliability included both total score agreement and agreement across individual push-up repetitions. This study utilized 8 Raters who assessed 15 different videotaped push-up performances over 4 iterations separated by a minimum of 1 week. The 15 push-up participants were videotaped during the semiannual Army Physical Fitness Test. Each Rater randomly viewed the 15 push-up and verbally responded with a “yes” or “no” to each push-up repetition. The data generated were analyzed using the Pearson product-moment correlation as well as the kappa, modified kappa and the intra-class correlation coefficient (3,1). An attribute agreement analysis was conducted to determine the percent of inter-rater and intra-rater agreement across individual push-ups.The results indicated that Raters varied a great deal in assessing push-ups. Over the 4 trials of 15 participants, the overall scores of the Raters varied between 3.0 and 35.7 push-ups. Post hoc comparisons found that there was significant increase in the grand mean of push-ups from trials 1–3 to trial 4 (p < 0.05). Also, there was a significant difference among raters over the 4 trials (p < 0.05). Pearson correlation coefficients for inter-rater and intra-rater reliability identified inter-rater reliability coefficients were between 0.10 and 0.97. Intra-rater coefficients were between 0.48 and 0.99. Intra-rater agreement for individual push-up repetitions ranged from 41.8% to 84.8%. The results indicated that the raters failed to assess the same push-up repetition with the same score (below 70% agreement) as well as failed to agree when viewed between raters (29%). Interestingly, as previously mentioned, scores on trial 4 increased significantly which might have been caused by rater drift or that the Raters did not maintain the push-up standard over the trials. It does appear that the final push-up scores received by each participant was a close approximation of actual performance (within 65%) but when assessing physical performance for retention in the Army, a more reliable test might be considered.

INTRODUCTION

The push-up is used to assess and to develop shoulder-arm and upper body strength/work-capacity.1,4 Military personnel from Army, Navy, Coast Guard as well as numerous NATO-wide armed forces5,7 participate in a semiannual or annual physical fitness test (PFT) which includes a push-up test. Because of the physical nature of military service, U.S. Army personnel are required to successfully pass the Army Physical Fitness Test (APFT) as one assessment of physical capacity and to remain in the active Armed Services. In addition, performance on any of the military physical fitness tests is a factor in retention and promotion which reinforces the need for optimal performance.

The Armed Services employ the 90° push-up and the number of repetitions completed in 1 to 2 minutes of exercise.2,8 A 90° push-up is successfully completed when the body is lowered as a unit from the starting position until the upper arm is parallel with the ground and the angle at the elbow is 90°. The utility of the test protocol allows for many people to be tested in a relatively short period of time and without additional equipment. In the U.S. Army, push-ups are evaluated by unit officers. Not only is proper execution of the push-up critical to accomplish expected training outcomes, but valid assessment of physical performance and fitness levels requires reliable evaluation of push-up technique.

Assessing proper push-up technique can be problematic (accurately judging the proper “up” position with arms fully extended and “down” position with the elbow at 90°). Hand placement and body position are easily defined, but observing and evaluating the complete extension of the elbows in the up-position and accurately judging the down position9,10 are especially difficult when the test subject is moving rapidly, as in the 2-minute APFT test; for example, military personnel typically attempt to accomplish as many push-up as possible in the first minute approaching a frequency of 1/second. Inter-rater reliability,11 or the agreement in scores between two or more raters, does not appear to be consistent with reported correlations ranging from 0.22 to 0.88.10,12,13 A number of studies comparing push-up assessment within the same rater across 2 or more trials (intra-rater reliability) suggest a high degree of agreement (r = 0.85–0.97).7,14,15 However, in other studies, intra-rater agreement has been found to be highly variable (r = 0.22–0.87).10,13,16 Some research was conducted using a 90° push-up such as used in the Fitnessgram10,12,16 while others used a modified bent-knee push-up.15 Based on the research, the scoring appears to be somewhat rater-dependent. Murr (1997) used videotaped push-ups and found that intra-rater reliability values ranged from 0.63 to 0.97 although inter-rater reliability values ranged from 0.36 to 0.56.

Baumgartner et al10(p226) diagrammed proper hand placement and body position for the push-up and stated that “one problem with the push-up test is specifying a down position which individuals can successfully execute and test administrators can accurately judge”. Accurate assessment of the down position requires raters to make a judgment to determine whether there is a 90° angle at the elbow. An auxiliary issue is how consistent the rater is when viewing the same push-up over time or rater variability. Part of the assessment error appears to be related to the ability or lack of ability of raters to view and rate an identical push-up in the same manner over time and determine whether it meets the standard definition and should be counted or considered “acceptable”. Although most of the research on push-up assessment has focused on inter-rater reliability, little research has focused on multiple trial intra-rater reliability. Further, previous research assessing intra-rater and inter-rater reliability have only compared agreement among total scores and have not compared agreement among individual push-up repetitions.

The purpose of the present study was to assess interrater reliability and intra-rater reliability of the 2-minute, 90° push-up test as utilized in the APFT. Analysis of rater assessment reliability included both total score agreement and agreement across individual push-up repetitions.

METHOD

Participants

Participants (n = 15; 12 males and 3 females; age = 21.2 ± 1.0 years) from the U.S. Military Academy (USMA) provided informed consent and were videotaped during the execution of the 2-minutes push-up test which was part of their semiannual fitness assessment requirement.

Faculty members (n = 8; 7 males and 1 female) were randomly selected from the USMA Department of Physical Education faculty to serve as raters. All raters received a written and oral description of the study and gave informed consent to participate. The raters comprised of four military and four civilian personnel; two of the civilian raters were retired military. The raters had a mean age of 40.5 ± 9.8 years and taught at USMA for an average of 8.2 ± 9.0 years. Military personnel (including the two civilian, retired military raters) served an average of 15.4 ± 8.0 years in the Army. All selected raters regularly administer the APFT during USMA's semiannual testing. All experimental procedures were reviewed and approved by the Academy's Institutional Review Board.

Protocol

The videotape of the 15 push-up tests was digitized and recorded onto a compact disc. The viewing angle of the push-up performances was recorded from a diagonal position forward and to the side of the performers, replicating the typical viewing angle of an authentic test. To assess inter-rater reliability and intra-rater reliability, raters (n = 8) viewed the digitalized push-up videos in real time (no slow motion or review of individual repetitions or tests) and verbally responded “yes” or “no” to each push-up repetition. Responses were recorded by study investigators. Raters completed four separate assessment trials by reviewing each of the 15 recorded tests. For each assessment trial, the 15 test videos were played in succession without stopping and the time between each individual 2-minute push-up test (30 seconds) was exactly as would be experienced during a typical semiannual APFT. Each assessment trial was separated by a minimum of 1-week. A minimum of 7 days was chosen because the average person can only retain between 5 and 9 objects in working memory which would make it very difficult for each rater to remember the evaluation on each of 1,015 push-ups assessed in each trial.17 The order of presentation of the 15 individual push-up tests was varied between trials so that individual videos were shown at the beginning, middle, or end of the different trials and the order of viewing was varied between trials so each test was viewed by the raters at approximately the same time over the period of four trials.

PROCEDURES

Definition of a Push-up

The Armed Services use similar standards in executing the proper push-up. The push-up protocol according to the U.S. Army in Field Manual 7-22 (2012) states that the participant begins in a front leaning rest, with the body generally straight from shoulders to ankles.The participant “begins” the push-up by bending the elbows and lowering the body as a single unit until the upper arms are at least parallel to the ground (90° push-up) and then return to the starting position by raising the entire body until full extension of the arms. The body must remain rigid in a generally straight line and move as a unit while performing each repetition. Failure to keep the body generally straight and moving as a single unit, to lower the whole body until the upper arms are at least parallel to the ground, or to extend the arms completely will constitute a failed, uncounted repetition.2

Data Analysis

Data are presented as mean and ± standard deviation (SD) (Table I). Data analysis was conducted with SPSS v. 17 Statistical Package (Armonk, New York) and the R statistical program. Inter-rater and intra-rater reliability were evaluated within and across the four assessment trials using Pearson product-moment correlation as well as the kappa, modified kappa (M-kappa) and the intra-class correlation coefficient (ICC). In addition, an Attribute Agreement Analysis (Minitab v. 15, State College, Pennsylvania) was conducted to determine the percent of inter-rater and intra-rater agreement (reliability) across individual push-up repetitions.

TABLE I

Summary of Total Scores for Raters by Trials

Trial 1 Trial 2 Trial 3 Trial 4 Total Range 
Rater 1 59.8 ±16.6 58.7 ±19.9 60.5 ±18.8 62.5 ±17.4 60.4a±1.6 3.8 
Rater 2 51.7 ±18.1 51.9 ±19.5 58.3 ±19.5 58.9 ±22.1 55.2a±3.9 7.2 
Rater 3 59.7 ±22.7 55.0 ±22.6 58.2 ±21.2 58.9 ±21.7 58.0a±2.1 4.7 
Rater 4 61.1 ±20.4 64.9 ±18.3 66.3 ±18.5 64.2 ±18.7 64.1a±2.2 5.2 
Rater 5 59.9 ±16.9 62.1 ±19.1 61.5 ±16.0 59.1 ±18.6 60.7a±1.4 3.0 
Rater 6 49.2 ±21.1 41.9 ±22.8 30.0 ±20.1 40.1 ±25.2 40.3a±7.9 19.2 
Rater 7 49.2 ±17.1 51.7 ±20.3 49.7 ±22.7 52.2 ±21.4 50.7a±1.5 3.0 
Rater 8 40.3 ±20.1 35.1 ±16.4 34.0 ±19.7 69.7 ±16.9 44.8a±16.8 35.7 
Mean 53.9b±19.9 52.7b±21.5 52.3b±22.8 58.2b±21.5 54.3 ±2.7 
Trial 1 Trial 2 Trial 3 Trial 4 Total Range 
Rater 1 59.8 ±16.6 58.7 ±19.9 60.5 ±18.8 62.5 ±17.4 60.4a±1.6 3.8 
Rater 2 51.7 ±18.1 51.9 ±19.5 58.3 ±19.5 58.9 ±22.1 55.2a±3.9 7.2 
Rater 3 59.7 ±22.7 55.0 ±22.6 58.2 ±21.2 58.9 ±21.7 58.0a±2.1 4.7 
Rater 4 61.1 ±20.4 64.9 ±18.3 66.3 ±18.5 64.2 ±18.7 64.1a±2.2 5.2 
Rater 5 59.9 ±16.9 62.1 ±19.1 61.5 ±16.0 59.1 ±18.6 60.7a±1.4 3.0 
Rater 6 49.2 ±21.1 41.9 ±22.8 30.0 ±20.1 40.1 ±25.2 40.3a±7.9 19.2 
Rater 7 49.2 ±17.1 51.7 ±20.3 49.7 ±22.7 52.2 ±21.4 50.7a±1.5 3.0 
Rater 8 40.3 ±20.1 35.1 ±16.4 34.0 ±19.7 69.7 ±16.9 44.8a±16.8 35.7 
Mean 53.9b±19.9 52.7b±21.5 52.3b±22.8 58.2b±21.5 54.3 ±2.7 

View Large

The Pearson product moment correlation coefficient is used to determine the degree of linear dependence between two variables. The kappa is a measure of inter-rater agreement while the M-kappa calculates inter-rater agreement when there are more than two unique raters and more than two ratings. Intra-class correlation coefficients (ICC 3,1) scores range from 0 to 1 and are “a measure of the relatedness of clustered data and accounts for the relatedness of clustered data by comparing the variance within clusters with the variance between clusters”.18,(p206) Artero et al9,(p161) suggested that an ICC between 0.70 and 0.80 is questionable, 0.90 is considered “high” and a score close to 1 indicates ‘excellent’ reliability. “ICC is an appropriate overall summary measure of agreement because it reflects both systematic bias and random error. These statistical procedures were utilized based on having multiple raters over multiple trials which provided more appropriate results than other statistical methods and were used by previous studies9,12 (i.e., Fleish-kappa). Tukey's post hoc comparisons were conducted to evaluate mean differences in push-up score between raters because an equal number of observations were completed across raters.

RESULTS

The means and SDs for the eight raters over the four trials as well as the grand means are presented in Table I. The grand mean of the first three trials (53.0 push-ups) was statistically lower than trial four (58.2 push-ups; p < 0.05). Tukey's post hoc analysis found that the values obtained for Raters 1 to 5 were statistically different than Rater 6. Raters 1, 3, 4, and 5 were statistically different than Rater 8, and Rater 4 was statistically different than Rater 7 (p < 0.05). Overall, Raters 1 to 5 tended to agree with each other while Raters 6 to 8 tended to agree with each other but not the other raters.

Over the four trials, six of the eight raters scored the participants within an intra-rater range of 3.0 to 7.2 push-ups or approximately 0.2 to 0.5 push-ups per participant (range/15 participants). However, two raters had a range of scores of 19.2 and 35.7 push-ups which equates to 1.3 and 2.4 push-ups, respectively, per participant. Noteworthy is the mean push-up scores for Rater 8 increased from trials 1 to 3 from an average of 36.5 push-ups to 69.7 push-ups in trail 4. The increase of over 33 push-ups came from counting numerous repetitions in two sets of push-ups as “good” when they were not counted in the previous trials.

Pearson correlation coefficients for inter-rater and intra-rater reliability identified inter-rater reliability coefficients were between 0.10 and 0.97. Intra-rater coefficients were between 0.48 and 0.99 (Table II).The results for individual push-up repetitions for intra-rater agreement ranged from a high of 84.8% (Rater 4) to a low of 41.8% (Rater 8) (Fig. 1). The average percent of agreement of total push-up score, kappa, M-kappa statistics, and ICC are presented Table III. ICC values for each rater (0.25–0.52) were below the questionable range of 0.7 to 0.8. These values suggest low intra-rater reliability.

TABLE II

Pearson Correlation Coefficients for Inter-rater, Intra-rater Reliability

Rater 1 Rater 2 Rater 3 Rater 4 Rater 5 Rater 6 Rater 7 Rater 8 
Rater 1 0.83–0.89
Rater 2 0.45–0.94 0.48–0.88
Rater 3 0.68–0.89 0.50–0.92 0.83–0.96
Rater 4 0.65–0.87 0.48–0.92 0.74–0.92 0.89–0.99
Rater 5 0.71–0.94 0.42–0.97 0.63–0.90 0.57–0.94 0.78–0.92
Rater 6 0.21–0.55 0.24–0.68 0.35–0.76 0.12–0.75 0.21–0.59 0.66–0.86
Rater 7 0.64–0.91 0.43–0.92 0.68–0.95 0.71–0.95 0.64–0.91 0.32–0.74 0.79–0.89
Rater 8 0.51–0.85 0.35–0.85 0.61–0.91 0.60–0.97 0.53–0.89 0.10–0.65 0.51–0.85 0.52–0.84
Rater 1 Rater 2 Rater 3 Rater 4 Rater 5 Rater 6 Rater 7 Rater 8 
Rater 1 0.83–0.89
Rater 2 0.45–0.94 0.48–0.88
Rater 3 0.68–0.89 0.50–0.92 0.83–0.96
Rater 4 0.65–0.87 0.48–0.92 0.74–0.92 0.89–0.99
Rater 5 0.71–0.94 0.42–0.97 0.63–0.90 0.57–0.94 0.78–0.92
Rater 6 0.21–0.55 0.24–0.68 0.35–0.76 0.12–0.75 0.21–0.59 0.66–0.86
Rater 7 0.64–0.91 0.43–0.92 0.68–0.95 0.71–0.95 0.64–0.91 0.32–0.74 0.79–0.89
Rater 8 0.51–0.85 0.35–0.85 0.61–0.91 0.60–0.97 0.53–0.89 0.10–0.65 0.51–0.85 0.52–0.84

View Large

FIGURE 1

Percent of intra-rater agreement by push-up repetition percent agreement based on an attribute agreement analysis. Agreement was tested across all push-up in all assessment trials (n = 1,015 push-ups).

FIGURE 1

Percent of intra-rater agreement by push-up repetition percent agreement based on an attribute agreement analysis. Agreement was tested across all push-up in all assessment trials (n = 1,015 push-ups).

TABLE III

Intra-Rater Reliability Comparison of Mean Scores

Rater % Agreement Kappa 95% Kappa Confidence Interval M-Kappa ICC (3,1) 
Trial 2 Trial 3 Trial 4 Trial 2 Trial 3 Trial 4 
Trial 1 88.59 85.67 88.50 0.5814 0.4378 0.5113 0.43–0.57 0.5002 0.5027 
Trial 2 84.83 90.67 0.4300 0.6231 
Trial 3 86.62 0.4132 
Trial 1 72.10 74.46 79.74 0.2884 0.2697 0.4147 0.33–0.46 0.3868 0.3967 
Trial 2 81.06 79.55 0.4564 0.4071 
Trial 3 86.99 0.5424 
Trial 1 79.83 83.42 83.69 0.3478 0.4037 0.3985 0.36–0.50 0.4319 0.4329 
Trial 2 82.09 83.88 0.4417 0.4884 
Trial 3 86.14 0.5137 
Trial 1 89.54 90.20 88.40 0.4687 0.4630 0.4311 0.44–0.61 0.5157 0.5230 
Trial 2 94.81 93.59 0.6177 0.5994 
Trial 3 94.06 0.5872 
Trial 1 87.18 86.52 84.73 0.4609 0.4798 0.4272 0.39–0.54 0.4671 0.4686 
Trial 2 86.71 86.62 0.4420 0.4567 
Trial 3 88.78 0.5508 
Trial 1 76.34 64.94 70.31 0.4898 0.3380 0.3797 0.37–0.47 0.4140 0.4054 
Trial 2 68.61 75.68 0.3895 0.5041 
Trial 3 71.35 0.4377 
Trial 1 73.77 73.04 72.49 0.3586 0.3590 0.3226 0.30–0.42 0.3567 0.3578 
Trial 2 72.07 76.35 0.3112 0.3913 
Trial 3 76.08 0.4058 
Trial 1 67.48 65.97 57.59 0.3533 0.3233 0.0197 0.15–0.22 0.1400 0.2496 
Trial 2 69.46 49.39 0.3882 0.0261 
Trial 3 49.20 0.0225 
ALL Trial 1 79.36 78.03 78.19 0.4469 0.4163 0.3425 0.40–0.44 0.4184 
Trial 2 79.96 79.47 0.4799 0.4032 
Trial 3 79.91 0.4196 
Rater % Agreement Kappa 95% Kappa Confidence Interval M-Kappa ICC (3,1) 
Trial 2 Trial 3 Trial 4 Trial 2 Trial 3 Trial 4 
Trial 1 88.59 85.67 88.50 0.5814 0.4378 0.5113 0.43–0.57 0.5002 0.5027 
Trial 2 84.83 90.67 0.4300 0.6231 
Trial 3 86.62 0.4132 
Trial 1 72.10 74.46 79.74 0.2884 0.2697 0.4147 0.33–0.46 0.3868 0.3967 
Trial 2 81.06 79.55 0.4564 0.4071 
Trial 3 86.99 0.5424 
Trial 1 79.83 83.42 83.69 0.3478 0.4037 0.3985 0.36–0.50 0.4319 0.4329 
Trial 2 82.09 83.88 0.4417 0.4884 
Trial 3 86.14 0.5137 
Trial 1 89.54 90.20 88.40 0.4687 0.4630 0.4311 0.44–0.61 0.5157 0.5230 
Trial 2 94.81 93.59 0.6177 0.5994 
Trial 3 94.06 0.5872 
Trial 1 87.18 86.52 84.73 0.4609 0.4798 0.4272 0.39–0.54 0.4671 0.4686 
Trial 2 86.71 86.62 0.4420 0.4567 
Trial 3 88.78 0.5508 
Trial 1 76.34 64.94 70.31 0.4898 0.3380 0.3797 0.37–0.47 0.4140 0.4054 
Trial 2 68.61 75.68 0.3895 0.5041 
Trial 3 71.35 0.4377 
Trial 1 73.77 73.04 72.49 0.3586 0.3590 0.3226 0.30–0.42 0.3567 0.3578 
Trial 2 72.07 76.35 0.3112 0.3913 
Trial 3 76.08 0.4058 
Trial 1 67.48 65.97 57.59 0.3533 0.3233 0.0197 0.15–0.22 0.1400 0.2496 
Trial 2 69.46 49.39 0.3882 0.0261 
Trial 3 

0 Replies to “One Minute Press Up Test Reliability Essay”

Lascia un Commento

L'indirizzo email non verrà pubblicato. I campi obbligatori sono contrassegnati *