PubTrends

PubTrends is an exploratory tool for researchers providing faster trends analysis and breakthrough papers discovery among the steadily growing flow of papers worldwide. The service aims to solve three tasks: give a brief overview of the field, explore popular trends in publications, and help to find new promising directions.

Contents

Datasets

PubTrends contains 30 mln papers and 175 mln citations of biomedical literature from the PubMed® database with 170 mln papers and 600 mln citations from the Semantic Scholar archive. Semantic Scholar aggregates significant journals and publishers, including Springer Nature, ACM, etc.

Workflow

Main page

PubTrends main page is the start point for papers analysis, several analysis options are supported.

All the analysis is based on similarity between papers. Similarity is computed based on Bibliometrics features, i.e. bibliographic coupling (number of common references in a pair of papers), co-citations (when a pair of papers are cited together), direct citations, and text similarity between papers. This information is used to create papers citations graph.
Citations graph is later used to compute papers graph embeddings. Texts of titles and abstracts are used to compute papers text embeddings. Combined graph and text embeddings they are used for papers similarity analysis and topics identification by clustering.
Finally, the user gets full report covering all the aspects of analysis.

Learn more exploring one of precomputed search queries from the PubTrends main page.

Example

Here we describe the analysis for the predefined search query "human aging".
We focus on 1000 most cited papers from the PubMed, with review papers, extending search set with connected papers by 20%.
The main report page contains all the analytics and consists of several parts: Papers, Trends, Network, Topics, Review and Other.
Side bar on the left of the page can be used for navigation. Please use About button for the help.

Papers

The Papers section demonstrates a birds-eye view of the field, including the total number of articles, and extracted topics. Word cloud shows the most frequent words in titles and abstracts. Also, it contains a summary plot of papers per year. Please note that the word cloud component is clickable, and you can navigate to the documents containing the selected word. Papers can be viewed as a plain list, as well.
Topics were computed by hierarchical clustering of papers embeddings based on text and graph embeddings.


Topics by year plot shows dissection of papers by topics and the number of papers for each topic by year.
Each topic is described by main keywords and total papers share.

The Trends section contains an interactive visualisation of top-cited papers, organised by number and citations count. Different types of articles are shown in different colours.
Most cited papers and papers with the quickest growth of citations are also shown here. All the papers are clickable, and we can explore details on a separate page.
Also the user can explore frequent keywords trends mentioned in papers.

Top Cited papers plot are shown in the plot, where user can see most important papers publication year, total number of citations and additional information about the paper on mouse hover.

Hot papers plot shows papers with the biggest number of citations by year.

Keywords frequency plot shows most frequent terms mentioned in papers and evolution in time.
Exact number of papers containing keyword is available on hover.

Network

Topics are closely related groups of documents. Aggregated graph and text embeddings are used to find similar papers and detect topics.
Overall structure of topics within a research field can be visualised as a papers similarity graph.


The graph behind shows the overall structure of the research field, hovered dedicated graph explorer can do much better. It supports papers coloring by year or by topic, provides reach capabilities for search and filtering in papers meta-information available. On the screenshot you can see papers coloured by different topics and the paper is highlighted with its connected papers.


Topics

Topics and identified using hierarchical clustering of papers embeddings, the user can explore topics hierarchy dendrogram in the dedicated plot.


For each topic, the application shows familiar to users word cloud and articles plot. Word cloud is built from terms specific to the given topic with respect to others. The more important word is the more significant fraction of papers contains it.
The topic below is dedicated to molecular makers of human aging including DNA methylation changes and telomeres shortening related with human aging.

Review

Generate a review for the chosen topic - a set of sentences from top cited papers with the highest probability to be included in a real review paper.

Review generation mechanism is described in the Open Access Paper: https://arxiv.org/abs/2010.04147
Citation: Nikiforovskaya, A., Kapralov, N., Vlasova, A., Shpynov, O. and Shpilman, A., 2020, December. Automatic generation of reviews of scientific papers. In 2020 19th IEEE International Conference on Machine Learning and Applications (ICMLA) (pp. 314-319). IEEE.

Other

Sections Authors and Journals shows the most productive authors and the popular journals in the analyzed set of papers.

Section Numbers allows for quick identification of numbers hidden in the set of papers.
The service scans papers for quantitative features mentioned in titles or abstracts and presents the user table with search capabilities.

Topics evolution analysis in an experimental feature, when topics identification is performed longitudinally at several time points to detect merges and splits of topics.
The user can inspect evolution of the topics.

Paper

Open Access Paper: https://doi.org/10.1145/3459930.3469501
Poster is available here.
Citation: Shpynov, O. and Kapralov, N., 2021, August. PubTrends: a scientific literature explorer. In Proceedings of the 12th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (pp. 1-1).


Feedback

Feel free to fill the feedback form on the bottom of page or use emotions buttons to share your thoughts.
Also you can optionally leave your email to contact you later.

This is an open-source project, you can explore the code or submit your issues directly to the GitHub at https://github.com/JetBrains-Research/PubTrends.