Mustafa Hasbahçeci1, Fatih Başak2, Aylin Acar2, Abdullah Şişik2

1Department of General Surgery, Bezmialem Vakıf University School of Medicine, İstanbul, Turkey
2Clinic of General Surgery, Ümraniye Training and Research Hospital, İstanbul, Turkey

Abstract

Objective: To compare the quality of oral presentations presented at the 19th National Surgical Congress with a national evaluation system with respect to the applicability of systems, and consistency between systems and reviewers.
Material and Methods: Fifty randomly selected observational studies, which were blinded for author and institute information, were evaluated by using the Strengthening the Reporting of Observational Studies (STROBE), Timmer Score, and National Evaluation System by two reviewers. Abstract scores, evaluation periods, and compatibility between reviewers were compared for each evaluation system. Abstract scores by three different evaluation systems were regarded as the main outcome. Wilcoxon matched-pairs signed rank and Friedman tests for comparison of scores and times, kappa analysis for compatibility between reviewers, and Spearman correlation for analysis of reviewers based on pairs of evaluation systems were used.
Results: There was no significant difference between abstract scores for each system (p>0.05). A significant difference for evaluation period of reviewers was detected for each system (p<0.05). Compatibility between reviewers was the highest for the Timmer Score (medium, κ=0.523), and the compatibility for STROBE and National Evaluation System was regarded as acceptable (κ=0.394 and κ=0.354, respectively). Assessment of reviewers for pairs of evaluation systems revealed that scores increased in the same direction with each other significantly (p<0.05).
Conclusion: The National Evaluation System is an appropriate method for evaluation of conference abstracts due to the consistent results between the referees similarly with the current international evaluation systems and ease of applicability with regard to evaluation period.

Keywords: Abstract, congress, reporting quality

Introduction

Various reporting criteria have been introduced in order to increase both the quality and impact of scientific studies and congress abstracts (1-4). In particular, these criteria were used in order to ensure the standardization of presentations and to increase the reporting quality of conference abstracts (5). There are several studies published in the literature regarding the ease of implementation of these systems during the evaluation and publication process in parallel with conference abstract acceptance (5-8). Nevertheless, application of these systems to each type of randomized-controlled, observational, and experimental studies is a time-consuming part of the selection process. In addition, the subjective features of the parameters constituting the system seem to create an obstacle that precludes these evaluations from wide acceptance (1, 8). Because of these reasons, the search continues in order to find ‘’the most appropriate system’’ rather than “the best system’’ (9).

Translation of these systems into different languages from the original creates semantic and conceptual problems. This application problem precludes these systems to be used generally (3, 10). In order to overcome this problem, there are studies conducted on the development of a national congress abstract evaluation system in the Turkish language (11). This recently developed Turkish evaluation system requires further cooperative studies in order to become compliant with the widely used international systems.

The aim of this study was to investigate the oral presentation abstracts of the 19th National Surgical Congress (NSC-2014) conducted in 2014 in terms of comparative analysis of international and Turkish congress abstract evaluation systems, the ease of implementation of the system by reviewers, and the determination of harmony between reviewers and system features.

Material and Methods

Oral abstracts, which were accepted and digitally printed for NSC-2014, were electronically scanned. Abstracts were categorized as “randomized-controlled trials”, “observational”, “experimental” or “other type” according to study type. Randomized-controlled trials were defined as prospective studies whereby participants were randomly allocated to the treatment or control group. Observational studies were defined as prospective descriptive (cohort), retrospective case-control and cross-sectional studies, and descriptive case series. Experimental studies were defined as all of the studies performed on any animals or tissues and cells obtained from individuals in the laboratory. Other type was defined as cost analysis, and survey reports and case studies that could not be categorized into “randomized-controlled trial”, “observational” or “experimental” groups. The distribution of the oral abstracts accepted in NSC-2014 is given in Table 1.


Randomized-controlled studies, experimental studies, and other type studies were excluded from the analysis. The sample group was selected as 50 abstracts by using a computer-assisted random number system among 404 observational studies presented in NSC-2014. This enabled the ability to identify a 10-15% difference with a predicted accuracy of 90%. The relevant parts of the reports in the sample groups were transformed into an invisible digital form and transferred into the image files to ensure blindness to the authors and clinics. The Turkish versions of STROBE (Strengthening the Reporting of Observational Studies), Timmer Score, and the previously proposed National Evaluation System (NES) were loaded up to a web-based abstract assessment system and were made available to reviewers in order to aid in the evaluation of the observational studies (http://ulusaldegerlendirmesistemi.blogspot.com.tr/).

After a personal assessment authority was given on the system, two researchers that were currently working as general surgeons in an education and research hospital (AS, AA) performed their evaluations independent of one another.

The system also noted the total time spent by each reviewer according to each type of evaluation. Abstracts in the sample group were scored by the users through the evaluation system using STROBE, Timmer, and NES instruments.

Each parameter developed under the system was assessed as “0” or “1” according to abstracts whether containing that feature in STROBE, Timmer and NES instruments or not. Declared scores were recorded as total scores. Three parameters among 16 parameters in NES and four parameters among 19 parameters in Timmer Score were excluded from the evaluation due to not being applicable to observational studies. Thus, the range of assessment tools was defined as 0-11 for STROBE (Appendix 1), 0-15 for Timmer (Appendix 2), and 0-13 for NES (Appendix 3).

Statistical Analysis
Descriptive statistical methods (mean, standard deviation, rate, and range of values) were used for assessment of the study data. Wilcoxon matched-pairs, signed rank, and Friedman tests were used for comparison of the scores and times spent for evaluation by reviewers according to each assessment system. Spearman correlation was used for analysis of reviewers based on pairs of the evaluation systems. Kappa analysis was used to assess compatibility between the two reviewers in terms of the results of each evaluation system. Kappa coefficient (κ) less than 0.20 was defined as weak, 0.21-0.40 was defined as acceptable, 0.41-0.60 was defined as moderate, 0.61-0.80 was defined as good, and a value greater than 0.80 was defined as very good when determining the degree of compliance. A p-value of less than 0.05 with a confidence level of 95% was considered as significant.

Results

Total scores and times spent for evaluation for each reviewer according to study type were given in Table 2 and Table 3. The highest score was 9 according to STROBE (maximum score 11), 14 according to Timmer (maximum score 15), and 12 according to NES (maximum score 13) when both reviewers were considered together. There was no statistically significant difference in terms of scores given by the reviewers for each evaluation system (Table 2).


There was a statistically significant difference in terms of duration of log-in to the system during the assessment of reviewers (Table 3). The duration of log-in to the STROBE system was significantly shorter in reviewer 1 (p=0.001) and significantly shorter for reviewer 2 (p=0.019) in the NES system when compared to the other two evaluation systems (Table 3).

Kappa analysis showed that the two reviewers were compatible in every evaluation system; however, the degree of compatibility was best in Timmer evaluation system (moderate, κ=0.523) (Table 4). The Kappa values for STROBE and NES were κ=0.394 (acceptable) and κ=0.354 (acceptable), respectively. All evaluations of each reviewer between systems (Strobe-Timmer, Strobe-NES, and Timmer-NES) resulted in a significantly positive correlation with an increase in scores in correlation analysis (For reviewer 1: p=0.003, 0.001, and 0.001, and for reviewer 2: p=0.014, 0.001, and 0.001, respectively) (Table 5).

Discussion

Evaluation and scoring of congress abstracts with any evaluation system clearly enhances writing quality and standardization. Worldwide accepted evaluation systems can be used for this purpose. However, development and dissemination of a national based system seem to be useful. Besides, parameters for the quality of research method can also be evaluated with that of reporting (8). However, it must be always kept in mind that there are no gold standards during arbitration and evaluation processes (9).

Observational studies constitute most of the oral and poster abstracts in scientific congresses, while randomized controlled trials and experimental studies play a minor role (1, 5, 6). Consistent with this data, observational studies constituted 85.4% of oral abstracts in NSC-2014. In addition, there has been a tendency of accepting most of the poster abstracts evaluated for being published in congresses (12). Thus, only observational studies containing oral abstracts were evaluated in this study.

Evaluation systems such as Consolidated Standards of Reporting Trials (CONSORT) and STROBE are widely used in the scientific field. It seems that these systems should be used separately for randomized-controlled trials and observational studies. In addition, the determination of separate criteria for each type of study has been suggested (3, 4, 10, 13). On the contrary, evaluation systems designed by Timmer et al. (8) seem to be applicable for more various type of studies, including experimental studies. However, it has been commonly observed that some changes can be made according to the type of study (5, 8, 11). Although NES is designed for evaluation of abstracts from both observational and randomized controlled studies, some parameters require distinction according to study type (11). Thus, implementation of two different evaluation systems for evaluation of observational and randomized-controlled studies seems to be more appropriate.

It was determined that 225 seconds was spent per each abstract using the evaluation system developed by Timmer. In addition, a handbook that has been prepared pertaining to relevant parameters was given to the reviewers in advance (8). Three hours was spent for 63 abstracts in the evaluation system that consisted of five parameters by Bydder (7). This system was solely designed for their studies. In that system, it was observed that a period of 180 seconds was necessary for each abstract. In our study, the time required for evaluation by the Timmer system was shorter (75.8-123.0 seconds). The time required for evaluation by STROBE and NES were different, 54.5-111.3 and 82.8-103.2 seconds, respectively. The effects of spending more time by reviewers on the evaluation remain unclear. Thus, no assessment could be performed regarding the effect of this relationship. However, it can be stated that the newly developed NES has been evaluated within acceptable periods as in other systems.

Another important problem during the evaluation of congress abstracts was the compatibility between reviewers (7). Incompatibility becomes more obvious in subjective parameters of the evaluation system such as importance, originality, triggering property of academic and scientific contributions (9, 14). Reviewer contribution from different centers also affects this relationship (14). Moreover, some studies in the literature suggest that the ‘luck factor’ and coefficients are very close in the harmony assessment of several reviewers among themselves shown by kappa analysis during the evaluation process of congress abstracts (12). Kappa coefficient values ranging from 0.11 and 0.60 have been reported related to the compatibility of reviewers in evaluation. As an interesting issue, a value higher than 0.40 has been rarely seen (9, 12). Taking these into account, it can be stated that there has been only a moderate compatibility between reviewers at best. In this study, kappa values showing compatibility between reviewers were 0.393, 0.523, and 0.354 for STROBE, Timmer, and NES, respectively. These values are acceptable and consistent when compared with similar studies in the literature. A high compatibility degree between reviewers indicates a reliable system. However, this can also indicate a bias where a group of abstracts are affected either positively or negatively in a systematic way (9). Taking these factors into account, a moderate compatibility degree seems to be the most eligible degree.

It is emphasized that a valuable scientific arbitration requires an age younger than 40 and education on epidemiology or statistics (14). Reviewers participating in this study were younger than 40; however, they were not educated on epidemiology or statistics. There are some approaches to increase the compatibility degree such as preparation of a user manual handbook, publication of articles explaining each parameter by samples, and giving less space on subjective parameters as possible (8, 10, 13, 14). Despite that, significant differences are observed among accepted and rejected abstracts for conferences (7, 9, 12). Thus, evaluation systems that are currently used seem to be an efficient way of assessing abstracts despite the incompatibility between reviewers (7, 14).

Well-attended harmony meetings are organized in order to determine parameters related to quality in current instruments during the preparation of a control list while evaluating an article or an abstract (2, 4, 9). Agreed parameters are revised at regular intervals. Moreover, STROBE recommends prior coordination with their centers before translation of evaluation system used for observational studies into another language. Turkish translations of STROBE and CONSORT criteria have been published as a result of mutual adaptation studies (15, 16). Besides, it has been turned into a system where public criticism and contributions can be received via a web page (3, 10). Additional review and development compliance meetings are required for ensuring the widespread use of a national based congress abstract assessment system.

Evaluation instruments specially designed for specific conferences can also be found in the literature along with CONSORT and STROBE evaluation systems, which are well described and prepared with the implementation of specific processes (7, 14). Regardless of being evaluated with any of these evaluation systems, it is more important just to evaluate the congress abstracts with one of these evaluation systems.

There are articles in the literature containing detailed descriptions of all criteria related to CONSORT and STROBE evaluation systems (10, 13). Evaluation systems have been recommended for use with explanatory articles (4). This approach enhances the compatibility between reviewers, and ensures the resulting score to be more accurate.

Study Limitations
There are several limitations of this study. Due to a lack in the number of randomized-controlled trials, observational studies were investigated solely during the study. Reviewers who evaluated the abstracts did not have any particular training on statistics or epidemiology. Correlation analysis between systems could not be assessed and interpreted properly due to different maximum scores of the systems.

Conclusion

National evaluation system, which is open to further development and is expected to gain national access, seems to be an appropriate system for evaluation of congress abstracts. Evaluation meetings with broad participation and further studies evaluating the system are required in order to receive criticism and contributions to the recommended system. A guideline should be published in order to increase the harmony between reviewers and ease of application. The regulation according to different study types is thought to lead to the ease of implementation.

Peer Review

Externally peer-reviewed.

Author Contributions

: Concept – M.H., F.B.; Design – M.H., F.B.; Supervision – M.H., F.B.; Resources – A.A., A.Ş.; Materials – A.Ş., A.A.; Data Collection and/or Processing – M.H., F.B., A.A., A.Ş.; Analysis and/or Interpretation – M.H., F.B.; Literature Search – M.H., F.B., A.Ş., A.A.; Writing Manuscript – M.H., F.B.; Critical Review – M.H., F.B., A.A., A.Ş.; Other – A.A., A.Ş.

Conflict of Interest

No conflict of interest was declared by the authors.

Financial Disclosure

The authors declared that this study has received no financial support.

References

  1. Can MF, Öztaş M, Yağcı G, Öztürk E, Yıldız R, Peker Y, ve ark. Ulusal Cerrahi Kongreleri’nde sunulan randomize kontrollü çalışma özetlerinin raporlama kalitesi: CONSORT kılavuzuna dayalı değerlendirme. Ulus Cerrahi Derg 2011; 27: 67-73.
  2. Vandenbroucke JP. STREGA, STROBE, STARD, SQUIRE, MOOSE, PRISMA, GNOSIS, TREND, ORION, COREQ, QUOROM, REMARK and CONSORT: for whom does the guideline toll? J Clin Epidemiol 2009; 62: 594-596.
  3. von Elm E, Altman DG, Egger M, Pocock SJ, Gøtzsche PC, Vandenbroucke JP; STROBE Initiative. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies. J Clin Epidemiol 2008; 61: 344-349.
  4. Hopewell S, Clarke M, Moher D, Wager E, Middleton P, Altman DG, et al. CONSORT for reporting randomised trials in journal and conference abstracts. Lancet 2008; 371: 281-283.
  5. Knobloch K, Yoon U, Rennekampff HO, Vogt PM. Quality of reporting according to the CONSORT, STROBE and Timmer instrument at the American Burn Association (ABA) annual meetings 2000 and 2008. BMC Med Res Methodol 2011; 11: 161.
  6. Yoon U, Knobloch K. Assessment of reporting quality of conference abstracts in sports injury prevention according to CONSORT and STROBE criteria and their subsequent publication rate as full papers. BMC Med Res Methodol 2012; 12: 47.
  7. Bydder S, Marion K, Taylor M, Semmens J. Assessment of abstracts submitted to the annual scientific meeting of the Royal Australian and New Zealand College of Radiologists. Australas Radiol 2006; 50: 355-359.
  8. Timmer A, Sutherland LR, Hilsden RJ. Development and evaluation of a quality score for abstracts. BMC Med Res Methodol 2003; 3: 2.
  9. Rowe BH, Strome TL, Spooner C, Blitz S, Grafstein E, Worster A. Reviewer agreement trends from four years of electronic submissions of conference abstract. BMC Med Res Methodol 2006; 6: 14.
  10. Vandenbroucke JP, von Elm E, Altman DG, Gøtzsche PC, Mulrow CD, Pocock SJ, et al. STROBE Initiative. Strengthening the Reporting of Observational Studies in Epidemiology (STROBE): explanation and elaboration. Int J Surg 2014; 12: 1500-1524.
  11. Hasbahçeci M, Başak F, Uysal Ö. Evaluation of reporting quality of the 2010 and 2012 National Surgical Congress oral presentations by CONSORT, STROBE and Timmer criteria. Ulus Cerrahi Derg 2014; 30: 138-146.
  12. Rothwell PM, Martyn CN. Reproducibility of peer review in clinical neuroscience. Is agreement between reviewers any greater than would be expected by chance alone? Brain 2000; 123: 1964-1969.
  13. Hopewell S, Clarke M, Moher D, Wager E, Middleton P, Altman DG, et al. CONSORT for reporting randomized controlled trials in journal and conference abstracts: explanation and elaboration. PLoS Med 2008; 5: e20.
  14. Montgomery AA, Graham A, Evans PH, Fahey T. Inter-rater agreement in the scoring of abstracts submitted to a primary care research conference. BMC Health Serv Res 2002; 2: 8.
  15. Karaçam Z. STROBE bildirimi: Epidemiolojide gözlemsel araştırma raporu yazımının güçlendirilmesi için bir rehber. Anadolu Hemşirelik ve Sağlık Bilimleri Dergisi 2014; 17: 1.
  16. Available from: //http://www.consort-statement.org/downloads/translations.