Mining Online Author’s Publication to Report the Core Research Domain with PubMed MeSH Terms: a Systematic Review for a Journal

Background Keywords

Volume 1 -Issue -3 and well-defined information the whole article which cannot be inferred merely by reading the title and the abstract.Our review of the relevant literature revealed no studies that have applied MeSH terms to identify author research domains (RDs).Our online mining approach will improve the ability of authors to objectively report their RDs when using increasingly large and complex PubMed data.Scientific publication is one of the objective measurements to evaluate the achievements of a medical specialty or discipline [8].Many journals are included in the Thomson Reuters Science Citation Index (SCI).Since the advent of bibliometrics, citation analysis has been widely used in many disciplines to evaluate the influence of academic articles [9][10][11][12][13][14][15][16][17].
Social network analysis (SNA) [18][19][20] is used to define authors, journals, or papers as the "nodes" of a network connecting to another node with a relationship represented as an edge [21,22].Several algorithms and measures have been developed and used with SNA to graphically explore data.When our aim is set to investigate whether any author or paper most fits the research domain of a journal and its scope within the journal's MeSH network, centrality measures can be applied [22], which means that the core subject can be analyzed using the centrality measure [23,24].We aimed to report prestigious authors, and prestigious papers contributed to a journal, both of which we can use to retrospectively (i) calculate a journal's past SCI IFs and (ii) show graphical representations of the RD for the author and the journal within the network of MeSH terms.

Data sources
We downloaded 2,053 abstracts based on the journal of medical internet research (JMIR) from the US NLM of Health (Pubmed.com) since 1999(July 1 st ) to 2017(April 3 th ), Their corresponding 1 st authors, MeSH terms, and cited-by papers along with the journal names were extracted from the online website using an author-made MS Excel VBA (visual basic for applications) module (Multimedia 1) A total of 1,945 cited-by journals were obtained from the JMIR website (Multimedia 2) in which 1,055 journals were included in Thomson Reuters in line with the 2015 5-year SCI impact factors from JCR.A concentration ratio (CR4) was computed according to the top four market shares [i.e., ratio=top four divided by all the proportions (=1.0)].CR4=0 represents no concentration, indicating perfect competition or at the very least monopolistic competition.Similarly, the four largest firms (here, the topic) in the industry would not have any significant market share.CR4<50% represents low concentration, indicating that the category ranges from perfect competition to an oligopoly.A CR4 value of 50%-80% represents medium concentration, suggesting an oligopoly.A CR4 value of 80%-100% represents high concentration, indicating that the category ranges from an oligopoly to monopoly.A CR4 value of 100% implies total concentration, indicating an extremely concentrated oligopoly or a monopoly.The CR4 will be applied to the following networks for examining author core domain with MeSH terms based on the Betweenness centrality (BC).

The first aim on the JMIR annual impact factors
The Journal 5-year Impact Factor in a year (e.g, 2015) is defined as the number of citations to articles published in the previous five years (e.g., 2010-2014) including the journal selfcites, divided by the number of articles published during that time [25,26].We calculated the eligible SCI citation number for each JMIR papers, summed the total citation number in the previous 2 and five years, and divided by the published number in the specific period.Fourteen impact factors from 2001 and 2004 to 2017 were computed for 2-year and 5-year impact factors, respectively.

The second aim of the weighted scores of the JMIR papers
We applied the weight (i.e., referring impact factors of cited-by papers) and un-weighted (i.e., just computing citation count of SCI cited-by papers) to the paper.The paper weighted score is a metric of excellence and a contribution (or a prestige) truly to the journal, In this study, we excluded those cited-by papers with JMIR selfcites to emphasize the paper on the 100% prestige from excellent outside journals in the author and paper contribution analysis.

The third aim of the most productive author in JMIR
The most productive author is selected from the number of published papers in JMIR.The author research domain was drawn using the SNA visualization representation.

The fourth aim of the weighted and un-weighted scores of the JMIR authors
Similar to the weighted scores(i.e., the adjusted by the journal impact factor) used for the best paper selection, we selected the first author by the measures of weighted and un-weighted score based on their cited-by papers excluding the JMIR self-cites.

The fifth aim on the MeSH terms for presenting JMIR aims and scope
Using the weighted score including JMIR self-cites, we obtained the score for each major MeSH terms (i.e., with a symbol of asterisk) in an article and then average them if any one exists in different JMIR papers.Secondly, any possible pair of MeSH in an article was linked together without duplication (e.g., either A->B or B->A will be selected once and only).Thirdly, the combined weighted score was generated by the multiplication of the two originated weighted score averaged within a similar pair (e.g., either A->B or B->A).The total combined weighted score will be computed for each pair of linkage.Finally, the top 100 combined weighted scores were selected and applied to the social network analysis.Pajek SNA software [27] was performed to draw the pattern of JMIR aims and scope with MeSH terms.Interested authors are recommended to see Multimedia 3 & 4).

An author or paper fitting not journal's scope with cohesion score
Betweenness centrality expresses a focus on the bases of how often a specific node is found on the shortest route between each pair of nodes in the network [28].We normalized betweenness centrality yielded by Pajek software for each actor (i.e., MeSH term

Open Acc Biostat Bioinform
Copyright © Tsair Wei Chien Volume 1 -Issue -3 in this example).The higher Betweenness of a weight score means the central and important extent more to the network.Any author's research domain (or a paper domain) with the top 100 major MeSH terms can be used for matching the journal's MeSH terms.The cohesion measure ranging from 0 to 1.0 can be computed by the topic portion explained by the total journal topics within JMIR major MeSH terms with their BC weighted scores.Figure 1 presents the study flowchart.The JMIR MeSH trend across years is present based on the relative frequency of the topic (i.e., term) appearance.The bubble size shows the strength of the topic in comparison with others' frequency.

The first aim of the JMIR annual impact factors
The 2 and 5-year impact factors for JMIR are present in Table 1.We can see that all the JMIR 5-year impact factors are not less than 3.0 since 2004.All the JMIR 2-year impact factors are not less than 1.78 since 2001.The number of 5-year impact factors is greater than that of 2-year impact factor in the number of years (i.e., 8/6=1.33)since 2004.The reasons for the low impact factor in 2017 are that (i) the data were early ended on 2017 Apr.Three earlier than the JCR report released for its report at the mid-June of the year, and (ii) some SCI papers were perhaps not included in this study.The concentration ratio (CR4: top four market share to all 1,055 citedby journals) is 24.46 %, indicating a low centrality is held ranging from perfect competition to an oligopoly.JMIR accounts for a large part of citation in the top 20 SCI cited-by journals see Figure 2.  The most prestigious paper (=108.52 with a weighted score) is the one [29] entitled "Works citing "Health-Related Effects Reported by Electronic Cigarette Users in Online Forums" online published on 2013 Aug. 08 at http://www.jmir.org/article/citations/2324.It has two major MeSH terms (telemedicine and internet) cited by 58 papers in which 50 were SCI papers.The cohesion measure fitting JMIR is 4.5% with two major MeSH terms (i.e., internet and telemedicine).Two authors, My Hua, Mina Alfi and Prue Talbot, contributed to this paper.

The third aim of the most productive author in JMIR
Author Gunther Eysenbach earns the most productive prestige with 27 papers published in JMIR and 20-SCI papers including JMIR papers.His cohesion measure fitting JMIR is high at 34% with 25 major MeSH terms.His research domain (Figure 3) is represented by the MeSH terms extracted from 20 papers downloaded from Pubmed.com.The strongest MeSH (with a high Betweenness centrality) shown in Figure 3 is internet in Cluster 2, followed by internet/utilization in Cluster 6, internet/organization & administration/standards in Cluster 11, and patient education as topic/methods in Cluster 21, see the name in red with the cluster number in parentheses.The CR4 is 43.91%, indicating the author core research domain is a little bit of an oligopoly (i.e., interdiscipline not centralized on a few domains).

The fourth aim of the weighted scores of the JMIR authors
The author Hua [29] has a high weighted score (=108.52).A total of 30 papers cited by other SCI papers (excluding JMIR) were found.Hua [29] is the first author of the most contribution paper in JMIR, who published one and the only one paper in JMIR till now.Using his MeSH terms from the six published papers downloaded from Pubmed library to draw his research domain, we can see in scope is easily classified in Figure 5.We can see the core MeSH term is internet on the left top corner in Cluster 2. Others are internet/ utilization in Cluster 1 and telemedicine/methods, and health promotion/methods are shown in Cluster 3. The CR4 is 79.82%, indicating the JMIR core scope is dominant in a range from an oligopoly to monopoly.The JMIR cohesion measure fitting himself is undoubtedly at 100% with 79 major MeSH terms.

Trends in cohesion measures and JMIR topics over years
Trends in cohesion measures, as well as counts of papers and MeSH terms, are shown in Figure 6.When excluding data in 2016 due to uncompleted MeSH terms.The least correlation coefficient between cohesion and the paper count was improved from 0.77 to 0.44.We can see the trends on average cohesion measures for all JMIR papers are held on the way beyond 2.5% since 2005.It is interesting to see the JMIR core topics over the years in Figure 7.There are two stages from the beginning legislation and internet trends (before 2004) to the internet in recent 12 years.The frequency occurred annually is numbered in brackets.The symbol in the last column denotes that the two counts presents significantly different when the lower portion added by 1.96* pooled SE is less than 0.5 (i.,e. a half probability for a pair count comparison).

Open Acc Biostat Bioinform
Copyright © Tsair Wei Chien Volume 1 -Issue -3 interet/*utilization and telemedicine/*methods in other clusters.We are confidently looking forward to seeing more authors and combining the two of SNA and MeSH for exploring journal scope in the future.

Strengths of this study
We demonstrated several videos in Multimedia to interested readers (i) how to extract such data from the US NLM(Pubmed.com), (ii) how to organize research data to congruent with study purpose, and (3) how to perform Pajek SNA software and calculate the cohesion measure to the journal.Future researchers are suggested to imitate (or even to emulate) this approach to other studies if combing SNA and MeSH terms.It is somewhat different from those search and extraction methods from Pubmed database [1,2] and those just applying SNA to a topic of interest [36][37][38].We used SNA to analyze MeSH terms associated with the journal, papers, and authors.It is also different from others applying to health report issues [39,40].In Figure 5 we can see that JMIR is enclosed with the keyword "internet."With a similar method, we can investigate other journals and report their concentration ratios.If no any core journal feature exists in its network, it means no any special image (or say product position) in an author mind.Exploratory data analysis is much more different from initial data analysis (IDA) [41] which focuses more narrowly on checking assumptions required for model fitting and hypothesis testing, and handling missing values and making transformations of variables as needed.In contrast, exploratory data analysis encompasses IDA to help us making policy and strategy in management.

Limitations and future studies
There are several limitations in this study.First, all data were extracted from Pubmed.com.Some papers without any MeSH term are included in this study due to lagging behind manually assigned to each document by biomedical subject specialists based on the context of a whole document [2], which will affect the results and inference making for this study.Second, there are many algorithms used for SNA.We merely applied separation components showing in Figures.Any changes made along with algorithm used will present different pattern and judgment.Third, we applied concentration ratios as an index to represent the core feature of the journal or the author.The definition of the concentration ratio does not use the market shares of all the MeSH terms in the studied network.The selection of the top 100 paired MeSH terms, the top 4 to compute the concentration ratio, and the Betweenness centrality scores of SNA will be biased when comparing to other criteria.The concentration ratios provide a sign of the oligopolistic nature in a network and indicate the degree of competition.The Herfindahl index [42] (also known as Herfindahl-Hirschman Index, or HHI) providing a complete picture of industry concentration than does the concentration ratio is recommended to future studies.
Fourth, the social network analysis is not subject to the Pajeck software we used in this study, others such as Ucinet [43] and Gephi [44] are suggested to readers for use in future.Fifth, all data were earlier downloaded on 2017 Apr 3. The impact factor might be lower than what we expected.Another reason can be attributable to that some SCI papers were not included in this study.Applying the 2015 SCI journals to all previous years to identify the cited-by SCI journals is another limitation.

Conclusion
Author's research domain is required to submit to a target journal with his/her manuscript.The journal editor is also expected to evaluate author's cohesion measure to the journal and his/her core research domain with MeSH terms to display.

Figure 1 :
Figure 1: Flow chart of this study.

Figure 3 :
Figure 3: Research domain of the author Gunther Eysenbach using MeSH terms to display from the 20 published papers in the Pubmed library.

Figure 4
Figure 4 that his cohesion measure is 5.7% with 3 terms fitting JMIR MeSH terms, and the MeSH of nicotine/administration & dosage is bridging the two clusters 2 and 4 on the left-top corner.The CR4 is 100% including the only one major MeSH term of nicotine/administration & dosage and internet, indicating author Hua's [29] core feature is a monopoly.Using the top 100 MeSH terms (i.e., 79 unique MeSH terms) to draw the pattern of JMIR core

Figure 4 :
Figure 4: Research domain of the author Hua [29], MY using MeSH terms to display from the six published.

Figure 5 :
Figure 5: Citing JMIR Journals altogether display the visualization domain of JMIR.

Figure 6 :
Figure 6: Trends of cohesion measures for JMIR over years.
How to cite this article: Chien C H , Tsair W C. Mining Online Author's Publication to Report the Core Research Domain with PubMed MeSH Terms: a Systematic Review for a Journal.Open Acc Biostat Bioinform.1(3).OABB.000515.2018.DOI: 10.31031/OABB.2018.01.000515 Volume 1 -Issue -3 Note.‡IF: Impact Factor; †when the value in 2-year IF is greater that in 5-year IF § denotes publication data collected until 2017 Apr 03.