Call for Workshop Papers

2nd International Workshop on Data Quality Assessment for Machine Learning
in conjunction with
The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2021)
Virtual event , August 14th-18th, 2021

In the past decade, AI/ML technologies have become pervasive in academia and industry, finding their utility in newer and challenging applications. While there has been a focus to build better, smarter and automated ML models little work has been done to systematically understand the challenges in the data and assess its quality issues before it is fed to an ML pipeline. Issues such as incorrect labels, synonymous categories in a categorical variable, heterogeneity in columns etc. which might go undetected by standard pre-processing modules in these frameworks can lead to sub-optimal model performance. Although, some systems are able to generate comprehensive reports with details of the ML pipeline, a lack of insight and explainability w.r.t. to the data quality issues leads to data scientists spending ~80% time on data preparation before employing these AutoML solutions. This is why data preparation has been called out as one of the most time-consuming step in an AI lifecycle. Since the quality of data is not known at Step 0, when the data is acquired, data preparation becomes an iterative debugging process and becomes more of an art, leveraging the experience of a data scientist. Because the performance of an ML model is only as good as the training data it sees, a systematic analysis of data quality before building AI/ML models is of utmost importance.

The goal of this workshop is to attract researchers working in the fields of data acquisition, data labeling, data quality, data preparation and AutoML areas to understand how the data issues, their detection and remediation will help towards building better models. With a focus on different modalities such as structured data, time series data, text data and graph data, this workshop invites researchers from academia and industry to submit novel propositions for systematically identifying and mitigating data issues for making data AI ready.

Methods of data assessment can change depending on the modality of the data. This workshop will invite submissions for data quality assessment for different modalities: structured (or tabular) data, unstructured (such as text, log, images) data, graph structured (relational, network) data, time series data, spatio-temporal data etc. We would like to explore state-of-the-art deep learning and AI concepts such as deep reinforcement learning, graph neural networks, self-supervised learning, capsule networks and adversarial learning to address the problems of data assessment quality for ML. Following is a (non-exhaustive) list of topics that are of interest to this workshop:
* 		Algorithms for assessment of data quality issues relevant to ML 
* 		Automatic remediation of data quality issues 
* 		Human-assisted data cleaning and remediation 
* 		Automated data cleaning workflows 
* 		Explainability and interpretability of quality assessment 
* 		Interactive debugging of data 
* 		Smarter data visualisations for high dimensional data 
* 		Evaluation techniques for data quality assessment 
* 		Real world use cases and applications of data quality assessment 
* 		Novel interfaces to assist human-in-the-loop intervention for interactive data cleaning
* 		Quality-aware representations and sampling of high dimensional data 
* 		Representative sampling for high dimensional data 
* 		Detection of bias and privacy breach 
* 		Label noise detection, explanation and incorporating feedback 
* 		Noise and low-quality data robustness studies 
* 		Handling corrupted, missing and uncertain data 
* 		Outlier (or anomaly) detection and mitigation in data 
* 		Addressing Class Imbalance in data 
* 		Benchmarking of data preparation and cleaning systems and tools: data sets and frameworks

Submission Instructions:
We solicit submission of papers of papers of 4 to 10 pages representing reports of original research, preliminary research results, case studies, proposals for new work and position papers. 
All papers will be peer reviewed, single blind (i.e. author names and affiliations should be listed). If accepted, at least one of the authors must attend the workshop to present the work. The submitted papers must be written in English and formatted in the double column standard according to the ACM Proceedings Template, Tighter Alternate style. The papers should be in PDF format and submitted via the EasyChair submission site. The workshop website will archive the published papers. The submitted papers must not be previously published anywhere and must not be under consideration by any other conference or journal during the workshop review process.

Important Deadlines:
Submission  : May 20th, 2021
Decisions    :  June 10th, 2021
Workshop    : August 14-18th, 2021

Workshop Organizers:
* 		Hima Patel, IBM Research AI, India 
* 		Fuyuki Ishikawa, National Institute of Informatics, Japan 
* 		Laure Berti-Equille, IRD, ESPACE-DEV, France 
* 		Nitin Gupta, IBM Research AI, India 
* 		Sameep Mehta, IBM Research AI, India 
* 		Satoshi Masuda, IBM Research AI, Japan 
* 		Shashank Mujumdar, IBM Research AI, India 
* 		Shazia Afzal, IBM Research AI, India 
* 		Srikanta Bedathur, Indian Institute of Technology Delhi, India 
* 		Yasuharu Nishi, The University of Electro-Communications, Japan

For further questions please contact the organizers at