Bibliometric Study of Welding Scientific Publications by Big Data Analysis
Pavel Layus, Paul Kah
Lappeenranta University of Technology, Skinnarilankatu, Lappeenranta, Finland
To cite this article:
Pavel Layus, Paul Kah. Bibliometric Study of Welding Scientific Publications by Big Data Analysis. International Journal of Mechanical Engineering and Applications. Vol. 3, No. 5, 2015, pp. 94-102. doi: 10.11648/j.ijmea.20150305.13
Abstract: Researchers are nowadays overloaded with scientific information, and it is often difficult to obtain a clear overview of existing topical research in some particular field. Big data tools and instruments can be utilized to define trending research topics by analyzing recent publications. This paper analyses 12000 articles related to arc welding from the Scopus database for the period 2001-2012 using VOS viewer and Microsoft Excel. The most commonly occurring keywords are presented statically and as a time series. The results of this paper provide an overall landscape of scientific research in the field of arc welding and help indicate trends of emerging topics in welding research. This work is of value to both industry and academia as an indicator of changes in the field and areas of current interest. Some guidelines for potential future research on the subject are provided.
Keywords: Bibliometrics, Scopus, Keywords, VOS Viewer, Big Data, Research Trends, Welding
The science of welding includes a great variety of research fields, such as metallurgy, mathematical modelling, physics, thermodynamics, heat transfer and many others. While the joining of metals has been used for centuries, welding technology has started its development from the invention of electricity in 19th century. Since that time, the science of welding has been in the core of numerous outstanding engineering achievements. Currently, welding is the primary way of joining metallic materials, and it plays a major role in automotive, steel structures, shipbuilding, agricultural and many other manufacturing industries. The global welding products market was valued at USD 17.47 billion in 2013 and is expected to reach USD 23.78 billion by 2020 and expand at a compound annual growth rate (CAGR) of 4.5% between 2014 and 2020 . A wide range of review articles and books (for instance [2-9]) were written on welding science, reading which could help gaining deeper understanding of scientific research in the field of welding.
Nowadays research data on various topics, including welding, are becoming more accessible than ever, thanks to the development of extensive online bibliographic databases containing abstracts and citations of academic journal articles. Notable examples of such databases are Scopus and Web of Science, and most recently published article abstracts and full texts can be accessed via such databases. The vast amount of available information, which is growing from year-to-year , is challenging to use and analyze efficiently. This challenge calls for the development of suitable approaches and tools to convert big data into understandable, usable and practical information. Research and development institutions and industry have become aware of the potential of big data to provide competitive advantage, and bibliometric analysis has been used in a number of fields as a way of highlighting and delineating trends [11-20].
Despite the huge amount of publications on welding, no attempt has yet been made to conduct bibliographic analysis of scientific publications in the field of welding. Moreover, there has been no efforts to evaluate trends in the field of welding. Research study that would address these topics is essential to reveal and explain the developments in the field, bring a deeper understanding of the impact of research on the literature and comprehensive advices for the future research in the field of welding. The motivation of this research is to find out which aspects of the science of welding were in the research spotlight during the last decade.
Current paper persuades an attempt to present most prominent trends and highlights in the field of arc welding research conducted over the last decade (2001-2012) by performing in-depth bibliographic analysis of scientific publications. This research work applies the approaches and tools of big data to quantitatively analyze about 10000 scientific journal publications related to the topic of arc welding. The hypothesis of this study states that using numerical analysis of a quantitatively collected dataset, it is possible to spot current trends and important topics in scientific research in the field of welding, and to find out which topics are of increasing interest and which of decreasing importance.
This broad bibliographic study of scientific publications on welding could be valuable to several beneficiary groups, such as researchers, educators and industrial professionals. Researchers would receive a special instrument that lets them determine previous and current research highlights as well as formulate future possible topics for the science of welding. Educators would get an information regarding overall landscape of the science of welding, which will allow them to embrace it in the teaching materials for modern welding technology courses and trainings. Industrial professionals would obtain in-depth information and use concepts that would support them in planning and carrying out their research and development as well as commercial projects in the field of welding technology. The value of this paper is that it presents an overall landscape of scientific research in arc welding, based on quantitative data, revealing various trends in the field. In addition, it further demonstrates the worth of bibliometric analysis as a basis for research of trends in engineering.
1.1. Big Data in the Context of Bibliometric Research
Big data analysis of scientific publications allows the measuring of trends based on bibliometric information such as keywords, date of publication, references and other records. Bibliometric data analysis has been applied in many research areas, including environmental assessment , sustainable hydropower development , nucleation techniques , risks of engineering nanomaterials  and even intercultural relations . Datasets of scientific publications for analysis can be constructed from various databases, of which the most important and comprehensive in the context of this study are the Scopus and Web of Science databases. These databases offer some tools to analyze the scientific datasets, their functionality is, however, rather limited. Specialist software products for bibliometric research are available that offer more tools and functions, such as the software tool used in this work, VOS viewer .
1.2. Comparison of Scopus and Web of Science
Several scientific document databases are available, of which Scopus and Web of Science are among the leading database services. Although the two databases differ in coverage, they can be considered to be complementary, and which database is most suitable depends on the discipline, topic and period of analysis . This study began with a comparison of the Scopus and Web of Science databases and selection of the most suitable database.
The Scopus database contains approximately 34278 journal entries, whereas Web of Science contains about 16957 journals. Scopus and Web of Science might include journals with an extremely low impact factor, indicating potentially low-quality publications. In an attempt to select higher quality journals it was decided to match records from Scopus and Web of Science with the Ulrich database , which contains a large number of highly rated journals. 20464 Scopus journals matched the Ulrich database and 13607 journals from Web of Science.
It can be seen that Scopus contains almost twice as many journals as Web of Science. However, as study  shows, the distribution of journal topics is different in Scopus and Web of Science. Since the primary focus of this research is welding, journals on Engineering and Technology are of interest. 33% of journals indexed in Scopus are in the Natural Sciences and Engineering domain. Web of Science shows a higher percentage of journals in the same domain – 43%. Nevertheless the number of indexed journals is significantly larger in Scopus, and therefore Scopus still represents the larger dataset (6730>5810). The journal topics distribution in Scopus and in Web of Science databases are shown in Fig. 1.
Where HS is Health Sciences journals; NSE is Natural Sciences and Engineering journals; AH is Arts and Humanities journals and SS is Social Sciences journals.
Although Scopus contains more journals in the natural sciences and engineering domain, this does not mean that Scopus contains more articles on welding. Therefore, the next step of the database comparison was to evaluate the number of scientific articles on welding. This evaluation can be made by searching for article titles that contain the word "weld" with all possible endings, e.g. welding, welded. The selected timeframe was the last 14 years, i.e. 2000-2014. As can be seen from Fig. 2, Scopus contains more entries with the word "weld" or one of its variants in the title for each year during 2000-2014.
In view of the larger journal coverage and greater number of scientific articles on welding, the Scopus database was selected as the source of the dataset for this study.
2. Analysis Process and Setup
The data for the analysis were acquired first by a search query "weld*". After manual cleaning of the dataset, about 14000 indexed scientific published articles in English listed on Scopus for 2001-2012 in the field of arc welding processes were selected. After the dataset was finalized, the article entries were analyzed using Scopus tools, VOS viewer 1.5.7 and Microsoft Excel.
2.1. Scopus Analysis
Scopus has a set of tools for analyzing the articles dataset. In this research, the total share of arc welding articles and its yearly distribution is given based on Scopus tools results. The size of the Scopus database increases on average 8.7% yearly. The number of journal article publications on welding is also increasing yearly, but at a slower pace – 5% on average. Therefore, the number of publications on welding as a share of total publications is decreasing. A possible explanation is an increase in the range of welding-related keywords introduced each year, since science becomes more diverse. However, it is also possible that publication on welding or related sciences are decreasing. Fig. 3 illustrates the number of papers in the field of welding in Scopus database.
Fig. 3 illustrates the number of papers published in the field of welding over 2001-2012. It can be seen that the number of articles follows a generally increasing trend; however, the jump in 2008-2009 is an interesting phenomena. The authors have not found any obvious facts or trends to explain this sudden change. The decrease in the number of publications during 2010 might be due to not all publications having been indexed and uploaded to Scopus.
2.2. VOS Viewer 1.5.7 Analysis
VOS viewer 1.5.7 software allows graphical representation of keywords clustering. The keywords are clustered to show their connection to each other and indicate the degree of similarity in meaning. The software analyses which keywords more often occur together in one article and puts them close to each other on the plot. The results of the clusterization process are presented in Fig. 4. It can be seen that keywords are divided into five clusters, which were manually labeled based on knowledge of welding research topics as:
• Materials and microstructure
• Mechanical properties and mechanical fracture modes
• Welding parameters and chemical properties and processes
• Computer tools and welding processes
As can be seen in Fig. 4, some keywords do not perfectly fit into the clusters; however, the vast majority of keywords allow cluster borders to be defined. Fig. 4 illustrates another problematic area of this approach, as some keywords are non-descriptive, such as "effect", "process" or "weld". In the current analysis, these non-descriptive keywords do not significantly influence the overall picture, as the clusters are clearly defined.
2.3. Microsoft Excel Analysis
Microsoft Excel analysis was performed to present the overall research topics and to illustrate trends in scientific research in the field of welding. In order to present trends, the twenty most commonly occurring keywords were selected from each year. The next step was to put them in Excel and fill the number of occurrences of each keyword in the selected period. Related keywords were manually combined into ten groups based on the authors’ knowledge of scientific research in the field of welding. Resulting groups correlated closely with VOSviewer clusters (Fig. 4). The close correlation suggests that the manual choice of keyword groups was correct. Some keywords, such as, for instance, the keywords "weld" and "welding", were omitted, as they are non-descriptive. Fig. 5 shows the percentage of each keyword group. It can be seen that materials, mechanical properties, mechanical failure modes and computer tools have the largest shares of 18%, 13%, 13% and 12%, respectively.
The largest keyword groups (i.e. materials, computer tools, mechanical failure modes, and mechanical properties) consisted of the keywords listed in Table 1.
|Keywords group||Keyword(s)||Number of occurrences||Total|
|Materials||Microstructure (Base metal, Ferrite, Austenite, HAZ, Martensite, Solidification, Weld metal)||3391||8531|
|Stainless steel (304 stainless steel, Austenitic stainless steel, Duplex stainless steel, Martensitic stainless steel)||1697|
|Chemical composition (Chlorine compounds, Hydrogen, Carbides, Carbon, Iron, Silicon, Tungsten)||1182|
|Steel (Carbon steel, Ferritic steel, High strength steels, Low carbon steel)||822|
|Mechanical failure modes||Corrosion (Pitting Corrosion, Coatings, Painting)||1528||5877|
|Fatigue (Fatigue crack propagation, Fatigue damage, Fatigue life)||1339|
|Fracture (Fracture mechanics, Brittle fracture)||929|
|Mechanical properties||Stress (Yield stress, Stress analysis, Stress concentration, Stress intensity factors)||1488||6319|
|Strength of materials||575|
|Loading (Cyclic loading, Structural analysis)||528|
|Tribology (Friction coefficient)||203|
Table 1 shows rather expected results; for instance, the most trending keyword in the materials group is microstructure (including its subtopic keywords). Microstructure is one of the most important topics in welding, as the mechanical properties of the weld are defined by the type of microstructure [25-27].
Articles dedicated to microstructure are mostly written about the heat-affected zone (31%), which was expected since it is closely connected to arc welding processes. The keywords "weld metal" and "base metal" are represented by significant percentages – 15% and 12% respectively. Other important topics are iron-phases (austenite (15%), ferrite (13%) and martensite (8%)). The distribution is presented in Fig. 6a.
Likewise, it can be seen that welding of stainless steels is an important topic in the materials domain, as stainless steel is the second most popular keyword, including the subtopic keywords (304 stainless steel, austenitic stainless steel, duplex stainless steel, and martensitic stainless steel). Over half of the journal articles on welding of stainless steel are dedicated to austenitic stainless steel (61%). Other important research areas include martensitic (17%) and duplex stainless steels (13%). The distribution is presented in Fig. 6b.
The next step of the study was to perform time series analysis to determine research trends. The time series analysis results for selected keyword groups and the distribution of keywords in the group of keywords is shown in Fig. 7.
The decline in 2012 most likely occurred due to non-complete Scopus records for recent years, and, the decline is expected to disappear in the next couple of years, when all papers from 2012 are indexed. The keyword group materials shows an interesting increase in 2008. One possible explanation is the increase of interest in dissimilar materials welding. The decline in testing and examination methods in the last years can be connected with the fact that this field of science is well established and does not attract many researchers.
One other group of keywords, which deserves attention, is arc welding processes, which is presented in Fig. 8. The number of articles related to TIG welding steadily increased from 2008 to 2012, which may be connected with intensive development of robotic welding systems, which often utilize TIG welding processes. The number of GMAW related articles reached a maximum in 2009, after which the prevalence of this topic has decreased. The drop in the number of articles may be due to GMAW being a rather well-known welding procedure, and therefore suffering from a decreasing amount of research interest in recent years. The number of studies on multipass welding is increasing, from 3 in 2007 to 20 in 2012, mostly due to an increase in publications related to simulation of multipass welding and predictions of the welding process outcome.
The major limitations of the study are primarily related to the selected dataset: the particular types of academic publications selected, the selection of English language publications only, incomplete records for some academic publications and the relatively short time period covered.
The Scopus database presents various types of scientific works, including journal articles, conference proceedings, book series, trade publications, books and reports. However, the research presented here only analyzes journal articles, which represent 53.7% of the total number of publications. Journal articles usually demonstrate research credibility, as in most cases journal articles are peer-reviewed, which indicates relatively high scientific value of the research. Journal articles form the largest part of publications on Scopus comparing to other publication groups (e.g. conferences, books), thus ensuring a sufficiently large dataset for statistically valid analysis.
During the current research work it was decided to focus solely on the period from 2001 to 2012. The end point of 2012 was chosen because some of the latest research publications from 2013-2015 may not have been uploaded to the Scopus database. In preliminary study, it was noted by the authors that a number of articles were only added to the Scopus database a couple of years after the publication date. Therefore, in order to obtain a more accurate dataset it was decided to utilize data only until 2013. Publications prior to 2001 are also not included, because a significant part of these papers might not be included in the Scopus database.
Another limitation of this research comes from possible inaccuracy in keyword analysis of the publications. The Scopus database contains many keywords with identical meaning but different grammatical form or spelling, and despite manual keyword cleaning, the dataset keywords may not be completely accurate. The error can be of the order 1-3%. Since this research considers only publications on arc welding, papers on other welding processes, such as laser or friction welding were excluded from the dataset. The exclusion process was performed by excluding articles with corresponding keywords, such as laser or friction. This process also might influence the accuracy of results, since articles that contain both arc and friction welding keywords are excluded.
Further limitations of this study are related to the accuracy and quality of the records obtained from the Scopus database. The obtained dataset of 14882 entries was manually examined, and entries with incomplete records were discarded. About 13960 articles (94%) had complete records and were used for further analysis in this study. According to , the Scopus database might contain duplicate entries of up to about 13%. Some duplicate entries might be left in the dataset after manual cleaning and might have an influence on the accuracy of the results.
The current research presents first bibliometrics work on scientific research in the field of arc welding. The study provides an overview of welding research articles for the period 2001-2012. The analysis is based on bibliometric information, such as keywords, date of publication, references and other records, which were processed with Scopus tools, VOS viewer and Microsoft Excel.
It was found that the number of publication in the topic of welding increased by 5% year-on-year. It was seen that keywords related to materials, mechanical properties, mechanical failure modes and computer tools had the largest shares of 18%, 13%, 13% and 12% respectively. Articles about microstructure were mostly related to the heat-affected zone (31%), which is a major concern for microstructural changes in welding. The keywords "weld metal" and "base metal" were represented by significant percentages – 15% and 12% respectively. Likewise, welding of stainless steel is an important topic in the materials domain, as stainless steel is the second most popular keyword, including its subtopic keywords: 304 stainless steel, austenitic stainless steel, duplex stainless steel, and martensitic stainless steel. Over half of the journal articles on stainless steel are dedicated to austenitic stainless steel (61%). Other important research areas include martensitic (17%) and duplex stainless steels (13%).
Time series analysis showed a decline in the overall number of arc welding related articles in 2012, which is most likely due to non-complete Scopus records for recent years, and the decline is expected to disappear in the next couple of years, when all papers from 2012 are indexed. Articles on topics related to materials showed an interesting increase in 2008. When considering arc welding processes, the number of articles on TIG welding increased steadily from 2008 to 2012, which may be connected with intensive development of robotic welding systems, which often utilize TIG welding processes. Articles with GMAW welding as a keyword reached a maximum in 2009, since when their number has fallen. A possible reason for this trend may be that GMAW is a rather well-known procedure and therefore of decreasing research interest. The number of studies on multipass welding is increasing, mostly due to the increasing number of publications related to simulation of multipass welding and predictions of welding process outcome.
The study has some limitations, most importantly the short time period studied and the types of scientific publications considered. However, despite its limitations, the study illustrates the theoretical and practical relevance of combining bibliometric or statistical datasets, indicates how such study can be approached, and gives pointers to issues that can be addressed using analysis of a large bibliometric dataset.
Future research on similar topics can provide additional, more detailed information and can reveal more specific aspects of research trends. Future study might analyze not only keywords of journal articles but might consider bibliometrics data of other scientific publications such as references or abstracts.
Authors would like to acknowledge help of prof. Leonid Chechurin for the idea formulation and advices during the beginning stage of the current research.