IEEE International Workshop on Data Science Systems (DSS)
Sponsored by IEEE Big Data 2021
http://www2.cs.uh.edu/~dss/DSS/home.html

Scope

Data Science has subsumed Big Data Analytics as an interdisciplinary endeavor, where the analyst uses diverse programming languages, libraries and tools to integrate, explore and build mathematical models on data, in a broad sense. Nowadays, there is a new trend building systems and novel approaches, which enable analysis on practically any kind of data: big or small, unstructured or structured, static or streaming, and so on. Data science has become the umbrella discipline to analyze data, going from detecting data quality problems, spending significant time on data pre-processing and ending up with some sophisticated Machine Learning model, like deep neural networks.

In the Data Science Systems (DSS) workshop we welcome interdisplinary research mixing programming languages, machine learning, database systems and high-performance computing. The DSS workshop will feature ”systems” research to enable data science on big data (with large-scale parallel processing), but also ”medium scale” data (a powerful workstation with multicore CPUs). Specifically, we welcome papers that present algorithms, data structures, functions, language extensions, optimizations that work well in modern data science languages, especially Python, R and SQL. It is fair to say that modern analysts can tradeoff some performance for ease of programming, ease of use or flexibility. Advances in hardware are making the cloud more attractive for data pre-processing and a local machine preferred for number crunching.
Important Dates

Oct 11, 2021: paper submission(short, full) 
 Nov 5, 2021: Notification of paper acceptance
Nov 21, 2021: Camera-ready for accepted papers
Dec 15 -18, 2021: Workshop (date to be determined later) 

Topics

Data quality diagnosis and repair, which can be tweaked and customized in DSS languages
Interoperability of diverse data pre-processing programs, working with different file formats
Querying relational and non-relational data, but outside database systems
Spliting processing between DSS languages and Big Data systems (e.g. R or Python runtime alone and PySpark in a cluster)
Extending data science languages with new operators and functions (e.g. like numpy)
Accelerating ML algorithms (statistical summarization, stochastic gradient descent)
Enabling database query functionality in data science languages (e.g. optimizing Pandas code)
Cross-language optimization (e.g. optimizing R bottlenecks with C/C++ code)
Splitting processing between data science languages and database languages (e.g. Python and SQL)
Novel parallel data processing architectures (e.g. combining parallel DBMSs, Hadoop and other distributed architectures)
Exploiting new-generation file systems beyond HDFS (HPC file systems)
Flexible, fast, well-defined interfaces to exchange big data (e.g. CSV, JSON files).
Exploiting HPC libraries like LAPACK and MKL in data science languages
Web interfaces for complex processing pipelines (e.g. JavaScript GUI, calling Python)
Benchmarks, understanding tradeoffs between time performance and ease of use (i.e. the fastest is not necessarily the best alternative)
Case studies, presenting technical details of a library or program that can be used by data analysts across several specialties (i.e. how it was programmed and deployed in some OS)

Program Co-Chairs

Ladjel Bellatreche, ENSMA, France
Carlos Ordonez, University of Houston, USA

Program Committee Members

Alejandro Aguilar, UNAM, Mexico
Anirban Mondal, Ashoka University, India
Nabil Layaïda, INRIA, Grenoble, France
Steven Euijong Whang, KAIST, Korea
Jorge Bernardino, ISEC - Polytechnic Institute of Coimbra, Portugal
Laurent D’Orazio, Univ Rennes, CNRS, IRISA, France
Carson Leung, University of Manitoba, Canada
Predrag Tosic, Washington State U, USA
Amin Beheshti, Macquarie University, Sydney, Australia
Jorge Galicia, ISAE-ENSMA, France
Soumia Benkrid, ESI, Algiers, Algeria
Philippe Fournier-Viger, Harbin Institute of Technology (Shenzhen), China
Gopal Pandurangan, University of Houston, USA
Panruo Wu, University of Houston, USA
Driss Benhaddou, University of Houston, USA

Best papers will be published an international journal.