The toolkit can help data scientists and policy makers understand COVID-19 data in real time, the company says.
As the COVID-19 pandemic continues to spread across the globe—on Wednesday, America saw 34,700 new COVID-19 cases in a single day, which was the highest since the late April—finding the right tools to detect and track the spread has become urgent.
On Thursday, a new open-source toolkit called COVID notebooks, offered by IBM’s Center for Open-Source Data and AI Technologies, could help address the issue. The toolkit is poised to perform basic tasks such as gathering and collating current data on the outbreak, cleaning up the data, and creating reports and graphs that can quickly illustrate the spread in real time.
The toolkit will use Jupyter Notebooks for data analysis, and create data processing pipelines through the Elyra Notebook Pipelines Visual Editor and KubeFlow Pipelines—which IBM says will relieve the burden for data scientists, who can turn their attention to higher-level tasks.
As the plethora of data points, which is constantly growing, from social media, news, health organizations, and other outlets is beginning to overwhelm data scientists and policy makers, the company says its new tool could help streamline data analysis and help connect the dots about the real impact of the disease, and which regions to turn attention to.
SEE: Coronavirus: Critical IT policies and tools every business needs (TechRepublic Premium)
Data analysts can use COVID notebooks to create their own analyses in real time, at a county level, and can even analyze data by certain conditions such as poverty level. It draws from the COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University, which has been a main source for organizations that have been generating predictions for the Centers for Disease Control and Prevention.
The COVID notebooks attempt to draw from a range of sources, such as the New York Times Coronavirus (Covid-19) Data in the United States and the European Centre for Disease Prevention and Control’s data on the geographic distribution of COVID-19 cases worldwide to help fill in the gaps.
SEE: How data analytic tools can provide clarity during the coronavirus pandemic (TechRepublic)
The tech behind COVID notebooks is a data framework called Pandas dataframes, and the TensorArray extension type from IBM’s Text Extensions for Pandas library is also used to keep time series tensors in the cells of Pandas dataframes. IBM’s team also “leveraged the graphical workflow editor that we have built as part of the Elyra project to tie our notebooks into workflows that you can run each day as new data becomes available.”
“IBM and our team believe in the importance of democratizing technology, activating developers with the most up-to-date datasets and tools, which can help policy makers make the most informed decisions for citizens’ well-being,” the press release states.
For those interested in getting started, you can begin by using the repo to create an analysis. Contributions from developers and data scientists into the GitHub repository are also welcome, according to IBM.
Data, Analytics and AI Newsletter
Learn the latest news and best practices about data science, big data analytics, and artificial intelligence.
- The latest cancellations: How the coronavirus is disrupting tech conferences worldwide (TechRepublic)
- Coronavirus having major effect on tech industry beyond supply chain delays (free PDF) (TechRepublic download)
- Coronavirus domain names are the latest hacker trick (TechRepublic)
- Extended Sick Day policy (TechRepublic Premium)
- As coronavirus spreads, here’s what’s been canceled or closed (CBS News)
- Coronavirus: Effective strategies and tools for remote work during a pandemic (ZDNet)
- How to track the coronavirus: Dashboard delivers real-time view of the deadly virus (ZDNet)
- Coronavirus and COVID-19: All your questions answered (CNET)