


We applied a topic modelling technique called Latent Dirichlet Allocation, or LDA, analysis to our dataset of around 4700 academic papers.
#Cytoscape js layouts code
While the analysis was carried out for demonstration purposes only, we hope that you will find our visualisations useful for your own work.Ĭheck out the app and its source code on GitHub, then read on to find out how it was made! LDA analysis The dataset that we have used is a subset of the CORD-19 dataset: a result of multiple research groups’ collaborative response to the COVID-19 pandemic. We will be using a demo app that leverages Dash Cytoscape to visualise thousands of academic papers, grouped by topics generated using Latent Dirichlet Allocation (LDA) techniques and connected by citations. In this article, we show you Dash Cytoscape, which lets you visualise and explore datasets and relationships within them using Plotly’s Dash. A good visualization can help its users to explore, manipulate and understand the dataset, as well as the outputs from NLP analysis. Visualizations can be crucial to such integrative deployments. So, NLP tools or outputs are best deployed as complementary tools for domain experts. Luckily, modern computational analysis tools such as natural language processing (or NLP) can help by exponentially speeding up text analysis and reducing error rates.īut these tools cannot (yet) replace all of the domain expertise, contextual understanding, and integrative thinking that reside in human analysts. Spending that kind of time and resources is inadvisable, if not impractical for most.

Whether it is a researcher reviewing publications in their field or a team of lawyers poring over every email, notebook and Post-It note, such a process often takes months, if not years. Much as no person is an island, a tweet, a text message, a patent or an academic paper are all connected to each other - by followers, recipients, subject matter, citations or references.Ĭomprehensively extracting such information has long been arduous and extremely time-consuming, not to mention costly. More data points also mean (exponentially) more potential relationships between those points. Half a million new tweets are posted every minute millions of blogs are written every day and millions of new patents and scientific papers are published every year. Visualising COVID-19 related papers with Dash-Cytoscape: check out the app and its source code!Ī staggering amount of text data is generated every day.
