Reliability’ of any research is the degree to which it gives an accurate score across a range of measurement. It can thus be viewed as being ‘repeatability’ or ‘consistency’. In summary:
• Inter-rater: Different people, same test.
• Test-retest: Same people, different times.
• Parallel-forms: Different people, same time, different test.
• Internal consistency: Different questions, same construct.
Inter-Rater Reliability-
When multiple people are giving assessments of some kind or are the subjects of some test, then similar people should lead to the same resulting scores. It can be used to calibrate people, for example those being used as observers in an experiment.
Inter-rater reliability thus evaluates reliability across different people.
Two major ways in which inter-rater reliability is used are (a) testing how similarly people categorize items, and (b) how similarly people score items.
Inter-rater reliability is also known as inter-observer reliability or inter-coder reliability.
Test-Retest Reliability-
An assessment or test of a person should give the same results whenever you apply the test.
Test-retest reliability evaluates reliability across time.
Reliability can vary with the many factors that affect how a person responds to the test, including their mood, interruptions, time of day, etc. A good test will largely cope with such factors and give relatively little variation. An unreliable test is highly sensitive to such factors and will give widely varying results, even if the person re-takes the same test half an hour later.
This method is particularly used in experiments that use a no-treatment control group that is measure pre-test and post-test.
Parallel-Forms Reliability-
One problem with questions or assessments is knowing what questions are the best ones to ask. A way of discovering this is do two tests in parallel, using different questions.
Parallel-forms reliability evaluates different questions and question sets that seek to assess the same construct.
Parallel-Forms evaluation may be done in combination with other methods, such as Split-half, which divides items that measure the same construct into two tests and applies them to the same group of people.
Internal Consistency Reliability-
When asking questions in research, the purpose is to assess the response against a given construct or idea. Different questions that test the same construct should give consistent results.
Internal consistency reliability evaluates individual questions in comparison with one another for their ability to give consistently appropriate results.
Average inter-item correlation compares correlations between all pairs of questions that test the same construct by calculating the mean of all paired correlations.
Average item total correlation takes the average inter-item correlations and calculates a total score for each item, then averages these.
Split-half correlation divides items that measure the same construct into two tests, which are applied to the same group of people, then calculates the correlation between the two total scores.
Reliability’ of any research is the degree to which it gives an accurate score across a range of measurement. It can thus be viewed as being ‘repeatability’ or ‘consistency’. In summary:
• Inter-rater: Different people, same test.
• Test-retest: Same people, different times.
• Parallel-forms: Different people, same time, different test.
• Internal consistency: Different questions, same construct.
Inter-Rater Reliability-
When multiple people are giving assessments of some kind or are the subjects of some test, then similar people should lead to the same resulting scores. It can be used to calibrate people, for example those being used as observers in an experiment.
Inter-rater reliability thus evaluates reliability across different people.
Two major ways in which inter-rater reliability is used are (a) testing how similarly people categorize items, and (b) how similarly people score items.
Inter-rater reliability is also known as inter-observer reliability or inter-coder reliability.
Test-Retest Reliability-
An assessment or test of a person should give the same results whenever you apply the test.
Test-retest reliability evaluates reliability across time.
Reliability can vary with the many factors that affect how a person responds to the test, including their mood, interruptions, time of day, etc. A good test will largely cope with such factors and give relatively little variation. An unreliable test is highly sensitive to such factors and will give widely varying results, even if the person re-takes the same test half an hour later.
This method is particularly used in experiments that use a no-treatment control group that is measure pre-test and post-test.
Parallel-Forms Reliability-
One problem with questions or assessments is knowing what questions are the best ones to ask. A way of discovering this is do two tests in parallel, using different questions.
Parallel-forms reliability evaluates different questions and question sets that seek to assess the same construct.
Parallel-Forms evaluation may be done in combination with other methods, such as Split-half, which divides items that measure the same construct into two tests and applies them to the same group of people.
Internal Consistency Reliability-
When asking questions in research, the purpose is to assess the response against a given construct or idea. Different questions that test the same construct should give consistent results.
Internal consistency reliability evaluates individual questions in comparison with one another for their ability to give consistently appropriate results.
Average inter-item correlation compares correlations between all pairs of questions that test the same construct by calculating the mean of all paired correlations.
Average item total correlation takes the average inter-item correlations and calculates a total score for each item, then averages these.
Split-half correlation divides items that measure the same construct into two tests, which are applied to the same group of people, then calculates the correlation between the two total scores.