Submission site is now open: https://cmt3.research.microsoft.com/PVLDBv14_2021/

   *** Call for Scalable Data Science Papers at PVLDB Vol 14 / VLDB 2021 ***

                            https://vldb.org/2021/
                             Copenhagen, Denmark
                              August 16-20, 2021

Scalable Data Science (SDS) is a newly created submission category within the Research Track of PVLDB Vol 14, with monthly deadlines starting on April 1, 2020. Authors of accepted papers are invited to present their work at the VLDB 2021 Conference.

We solicit submissions of papers describing design, implementation, experience, or evaluation of solutions and systems for practical data science and data engineering tasks, including data management, data engineering, data analytics, data visualization, data quality, data integration, data mining, and machine learning on large-scale data.

Distinct from the Regular Research papers, papers in this category do not necessarily propose new breakthrough algorithms or models, but emphasize solutions that either solve or advance the understanding of issues related to data science technologies in the real world.

Papers regarding deployed solutions describe the implementation of a system that solves a significant real-world problem and is (or was) in use for an extended period of time in industry, science, medicine, education, government, nonprofit organizations, or as open source. The paper should present the problem, its significance to the application domain, the design choices for the solution, the implementation challenges, and the lessons learned from successes and failures, including post-launch performance analysis. Papers that describe enabling infrastructure for deployment of applied machine learning also fall into this category. An example may be an open-source, general-purpose entity linkage tool that takes data from any two data sources and links records that refer to the same real-world entity. Or a paper on a low-latency system to automatically monitor online model predictions on streaming data at scale to detect concept drift and recommend how to react.

Papers regarding evaluated but not necessarily deployed solutions shall describe fundamental insights derived from addressing a real-world problem. This might include papers that provide significant insights into an applied area/domain or papers that provide strong baselines that are thoroughly tested on real data. We also encourage papers that conclude that a problem is solved under particular conditions or is infeasible with current techniques. In addition to insights, the paper should explain what milestones were reached, what the practical impact is, and (if applicable) what the obstacles to deployment are. Straightforward improvements over trivial baseline solutions tested on small datasets are unlikely to qualify. Continuing with the previous example, a paper might present an entity linkage model that applies state-of-the-art deep learning techniques and obtains high performance on a few real-world datasets, showing success of adaptations of recent techniques in helping solve an important and practical data science problem. Similarly, a paper on a system to handle concept drift in streaming prediction applications may apply or extend recent statistical or ML approaches but demonstrates their efficacy and scalability convincingly with real-world datasets.

Submissions should be up to eight pages long, with unlimited pages for references. The papers need not cover all aspects of an application or give all details. Instead, we encourage papers with key insights supported by solid data points.

This new category helps bridge the gap between the Regular Research papers and the Industrial Track papers, especially due to the fast evolving nature of data science. In particular, it differs from the Industrial Track on both scope and level of impact expected. This category focuses more specifically on new technology for data science-oriented workloads, while the Industrial Track is more general and covers all aspects of database technology. The Industrial Track focuses on already commercial technology, while this category also welcomes work that may not yet be commercial or deployed but still at the proof-of-concept stage, as long as it is convincingly validated and has good potential for impact. In relation to concurrent submissions, authors are not allowed to submit papers on the same work to any other category or track of VLDB, except for the Demonstrations Track.

It is our hope that this new category will attract more of the cutting-edge and impactful real-world work in the scalable data science arena to VLDB for the benefit of the VLDB community, including spurring new technical connections, inspiring new follow-on research on scalable data science, and enhancing the impact of the VLDB community on data science practice.

* First submission deadline: April 1, 2020 (5pm PST)
* Submission site: https://cmt3.research.microsoft.com/PVLDBv14_2021/
* Submission guidelines: http://www.vldb.org/pvldb/submission_vol14.html
* More information: http://www.vldb.org/pvldb/contributions_vol14.html

Best regards,

Alon Halevy (Facebook), Arun Kumar (UC San Diego), Nesime Tatbul (Intel Labs and MIT)
PVLDB Vol 14 / VLDB 2021 Scalable Data Science Category Co-Chairs