DBWorld Message

The following 96 research papers, 23 industry papers, and 7 tutorials have been accepted for presentation and publication in the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'08).

In addition, the following panel and 13 workshops will be run in the conference.

Social Networks: Looking Ahead
Chair: Ravi Kumar, Yahoo! Research
Panelists: Christos Faloutsos (Carnegie Mellon University); David Jensen (University of Massachusetts, Amherst) Jure Leskovec (Carnegie Mellon University); Gueorgi Kossinets (Cornell University)

More information about the conference including the general schedule as well as the registration instructions can be found at the conference web page

http://kdd2008.com/

The detailed program will be added to the conference web page soon. We promise you a very exciting program and look forward to your participation. The early registration deadline is July 20, 2008.

==============================
Research Track Accepted Papers
==============================

15. Cross-Domain Spectral Learning. Xiao Ling, Wenyuan Dai, Gui-Rong Xue, Qiang Yang, Yong Yu.

35. Learning Classifiers from Only Positive and Unlabeled Data. Charles Elkan, Keith Noto

38. Automatic Record Linkage Using Seeded Nearest Neighbour and Support Vector Machine Classification. Peter Christen.

46. SPIRAL: Efficient and Exact Model Identification for Hidden Markov Models. Yasuhiro Fujiwara, Yasushi Sakurai, Masashi Yamamuro.

50. Microscopic Evolution of Social Networks. Jure Leskovec, Lars Backstrom, Ravi Kumar, Andrew Tomkins.

52. A Family of Dissimilarity Measures between Nodes Generalizing both the Shortest-Path and the Commute-time Distances. Luh Yen, Amin Mantrach, Masashi Shimbo, Marco Saerens.

62. Direct Mining of Discriminative and Essential Graphical and Itemset Features via Model-based Search Tree. Wei Fan, Kun Zhang, Hong Cheng, Jing Gao, Xifeng Yan, Jiawei Han, Philip Yu, Olivier Verscheure.

75. Mining Preferences from Superior and Inferior Examples. Bin Jiang Jian Pei, Xuemin Lin, David W-L Cheung, Jiawei Han.

89. Structured Metric Learning for High Dimensional Problems. Jason Davis, Inderjit Dhillon.

92. Permu-pattern: Discovery of Mutable Permutation Patterns with Proximity Constraint. Meng Hu, Jiong Yang, Wei Su.

99. Partitioned Logistic Regression for Spam Filtering. Ming-Wei Chang, Wen-tau Yih, Chris Meek.

105. Finding Non-redundant, Statistically Significant Regions in High Dimensional Data: A Novel Approach to Projected and Subspace Clustering. Gabriela Moise, Joerg Sander.

106. Weighted Graphs and Disconnected Components: Patterns and a Generator. Mary McGlohon, Leman Akoglu, Christos Faloutsos.

125. Factorization Meets the Neighborhood: a Multifaceted Collaborative Filtering Model. Yehuda Koren.

127. Discrimination-aware Data Mining. Dino Pedreschi, Salvatore Ruggieri, Franco Turini.

140. Mining Adaptively Frequent Closed Unlabeled Rooted Trees in Data Streams. Albert Bifet, Ricard Gavaldà.

142. The Cost of Privacy: Destruction of Data-Mining Utility in Anonymized Data Publishing. Justin Brickell. Vitaly Shmatikov.

149. Colibri: Fast Mining of Large Static and Dynamic Graphs. Hanghang Tong. Spiros Papadimitriou, Jimeng Sun, Philip Yu, Christos Faloutsos.

153. A Sequential Dual Method for Large Scale Multi-Class Linear SVMs. Sathiya Keerthi, S. Sundararajan, Kai-Wei Chang, Cho-Jui Hsieh, Chih-Jen Lin,

160. Efficient Computation of Personal Aggregate Queries on Blogs. Ka Cheung Sia, Junghoo Cho, Yun Chi, Belle L. Tseng.

163. Feedback Effects between Similarity and Social Influence in Online Communities. David Crandall, Dan Cosley, Daniel Huttenlocher, Jon Kleinberg, Siddharth Suri.

168. CutS3VM: A Fast Semi-Supervised SVM Algorithm. Bin Zhao, Fei Wang, Changshui Zhang.

169. Probabilistic Latent Semantic Visualization: Topic Model for Visualizing Documents. Tomoharu Iwata, Takeshi Yamada, Naonori Ueda.

181. Two Birds with One Stone: A Joint Model For Structured Entity Identification and Document Categorization. Indrajit Bhattacharya, Shantanu Godbole, Sachindra Joshi.

220. Categorizing and Mining Concept Drifting Data Streams. Peng Zhang, Xingquan Zhu, Yong Shi.

251. Efficient Semi-streaming Algorithms for Local Triangle Counting in Massive Graphs. Luca Becchetti, Paolo Boldi, Carlos Castillo, Aristides Gionis.

269. Angle-Based Outlier Detection in High-dimensional Data. Hans-Peter Kriegel, Matthias Schubert, Arthur Zimek.

276. Efficient Ticket Routing by Resolution Sequence Mining. Qihong Shao, Yi Chen, Shu Tao, Xifeng Yan, Nikos Anerousis.

277. Building Semantic Kernels for Text Classification using Wikipedia. Pu Wang, Carlotta Domeniconi.

289. Unsupervised Deduplication using Cross-Field Dependencies. Robert Hall, Charles Sutton, Andrew Mccallum.

290. Interpretable Nonnegative Matrix Decompositions. Saara Hyv?nen, Pauli Miettinen, Evimaria Terzi.

291. Constraint Programming for Itemset Mining. Luc De Raedt, Tias Guns, Siegfried Nijssen.

296. On Updates that Constrain the Features' Connections. Omid Madani, Jian Huang.

305. FastANOVA: an efficient algorithm for genome-wide association study.Xiang Zhang, Fei Zou, Wei Wang.

307. Fast Logistic Regression for Text Categorization with Variable-Length N-grams. Georgiana Ifrim, Goekhan Bakir, Gerhard Weikum.

318. Bridging Centrality: Graph Mining from Element Level to Group Level. Woochang Hwang.

320. Banded Structure in Binary Matrices. Gemma Garriga, Esa Junttila, Heikki Mannila.

325. Model-Based Document Clustering with a Collapsed Gibbs Sampler. Daniel Walker, Eric Ringger.

335. Constructing Comprehensive Summaries of Large Event Sequences. Jerry Kiernan, Evimaria Terzi.

340. A Bayesian Mixture Model with Linear Regression Mixing Proportions. Xiuyao Song, Chris Jermaine, Sanjay Ranka, John Gums.

342. Training Structural SVMs with Kernels using Sampled Cuts. Chun-Nam Yu, Thorsten Joachims.

347. Using Ghost Edges for Classification in Sparsely Labeled Networks. Brian Gallagher, Hanghang Tong, Tina Eliassi-Rad, Christos Faloutsos.

349. Mobile Call Graphs: Beyond Power-Law and Lognormal Distributions. Mukund Seshadri, Sridhar Machiraju, Ashwin Sridharan, Jean Bolot, Christos Faloutsos, Jure Leskovec.

362. Stable Feature Selection via Dense Feature Groups. Lei Yu, Chris Ding, Steven Loscalzo.

372. Noisy Multilabeling for Data Mining. Victor Sheng, Foster Provost, Panagiotis G. Ipeirotis.

378. Fast Collapsed Gibbs Sampling For Latent Dirichlet Allocation. Ian Porteous, Dave Newman, Arthur Asuncion, Alexander Ihler, Padhraic Smyth, Max Welling.

388. iSAX: Indexing and Mining Terabyte Sized Time Series. Jin Shieh, Eamonn Keogh.

389. Active Learning with Direct Query Construction. Charles Ling, Jun Du.

394. Local Peculiarity Factor and its Application in Outlier Detection. Jian Yang, Ning Zhong, Yiyu Yao, Jue Wang.

400. Locality Sensitive Hash Functions Based on Concomitant Rank Order Statistics. Kave Eshghi,

401. Composition Attacks and Auxiliary Information in Data Privacy. Srivatsava Ranjit Ganta, Shiva Kasiviswanathan, Adam Smith.

402. Scaling Up Text Classification for Large File Systems. George Forman, Shyamsundar Rajaram.

404. Entity Categorization over Large Document Collections. Arnd K?nig, Venkatesh Ganti.

413. The Structure of Information Pathways in Social Communication Networks. Jon Kleinberg, Gueorgi Kossinets, Duncan Watts.

420. SAIL: Summation-based Incremental Learning for Information-Theoretic Clustering. Junjie Wu, Hui Xiong, Jian Chen.

426. Stream Prediction Using a Generative Model Based on Frequent Episodes in Event Sequences. Srivatsan Laxman, Vikram Tankasali, Ryen White.

434. Knowledge Transfer via Multiple Model Local Structure Mapping. Jing Gao, Wei Fan, Jing Jiang, Jiawei Han.

439. Relational Learning via Collective Matrix Factorization. Ajit Singh, Geoff Gordon.

440. Classification with Partial Labels. Nam Nguyen, Rich Caruana.

442. Volatile Correlation Computation: A Checkpoint View. Wenjun Zhou, Hui Xiong.

455. Anonymizing Transaction Databases for Publication. Yabo Xu, Ke Wang, Ada Fu, Philip Yu.

456. Can Complex Network Metrics Predict the Behavior of NBA Teams?. Pedro Olmo Vaz de Melo, Virgilio Almeida, Antonio Loureiro.

460. Cut-And-Stitch: Efficient Parallel Learning of Linear Dynamical Systems on SMPs. Lei Li, Wenjie Fu, Todd Mowry, Christos Faloutsos, Fan Guo.

463. Information Extraction from Wikipedia: Moving Down the Long Tail. Fei Wu, Raphael Hoffmann, Daniel Weld,

469. Bypass Rates: Reducing Query Abandonment using Negative Inferences. Atish Das Sarma, Sreenivas Gollapudi, Samuel Ieong.

472. Community Evolution in Dynamic Multi-Mode Networks. Lei Tang, Huan Liu, Jianping Zhang, Zohreh Nazeri.

496. Knowledge Discovery of Semantic Relationships between Words Using Nonparametric Bayesian Graph Model. Issei Sato, Hiroshi Nakagawa, Minoru Yoshida.

518. Reconstructing Chemical Reaction Networks: Data Mining meets System Identification. Yong Ju Cho, Naren Ramakrishnan, Yang Cao.

537. Unsupervised Feature Selection for Principal Components Analysis. Christos Boutsidis, Michael Mahoney, Petros Drineas.

548. Effective Label Acquisition for Collective Classification. Mustafa Bilgic, Lise Getoor.

554. A Semi-Supervised Approach to Rapid and Reliable Labeling of Large Data Sets . Gyorgy Simon, Vipin Kumar, Zhi-Li Zhang.

563. Data Mining Using High Performance Data Clouds: Experimental Studies Using Sector and Sphere. Robert Grossman, Yunhong Gu.

569. Topical Query Decomposition. Francesco Bonchi, Carlos Castillo, Debora Donato, Aristides Gionis.

571. Automatic Identification of Quasi-experimental Designs for Discovering Causal Knowledge. David Jensen, Andrew Fast, Marc Maier, Brian Taylor.

576. Identifying Biologically Relevant Genes via Multiple Heterogeneous Data Sources. Zheng Zhao, Jiangxin Wang, Huan Liu, Jieping Ye, Chang Yung.

577. Anomaly Pattern Detection in Categorical Datasets. Kaustav Das, Jeff Schneider, Daniel Neill.

594. Semi-supervised Learning with Data Calibration for Long-Term Time Series Forecasting. Haibin Cheng, Pang-Ning Tan.

611. De-duping URLs via Rewrite Rules. Anirban Dasgupta, Ravi Kumar, Amit Sasturkar.

613. FAST: A ROC-based Feature Selection Metric for Small Samples and Imbalanced Data Classification Problems. Xue-wen Chen, Mike Wasikowski.

623. Asymmetric Support Vector Learning with Low Generalizable False-Alarm Rate. Shan-Hung Wu, Keng-Pei Lin, Chung-Min Chen, Ming-Syan Chen.

632. Quantitative Evaluation of Approximate Frequent Pattern Mining Algorithms. Rohit Gupta, Gang Fang, Blayne Field, Michael Steinbach, Vipin Kumar.

672. Partial Least Squares Regression for Graph Mining. Hiroto Saigo, Nicole Kraemer, Koji Tsuda.

681. Generating Succinct Titles for Web Pages. Deepayan Chakrabarti, Ravi Kumar, Kunal Punera.

685. Succinct Summarization of Transactional Databases: An Overlapped Hyperrectangle Scheme.Yang Xiang, Kent State University Ruoming Jin, David Fuhry, Feodor Dragan.

686. Influence and Correlation in Social Networks. Aris Anagnostopoulos, Ravi Kumar, Mohammad Mahdian.

692. Extracting Shared Subspace for Multi-label Classification. Shuiwang Ji, Lei Tang, Shipeng Yu, Jieping Ye.

695. Effective and Efficient Itemset Pattern Summarization: Regression-based Approaches. Ruoming Jin, Muad Abu-Ata, Yang Xiang, Ning Ruan.

702. Learning Subspace Kernels for Classification. Jianhui Chen, Shuiwang Ji, Saadet Betul Ceran, Qi Li, Mingrui Wu, Jieping Ye.

750. Mining Multi-Faceted Overviews of Arbitrary Topics in a Text Collection. Xu Ling, Qiaozhu Mei, ChengXiang Zhai.

751. Joint Latent Topic Models for text and citations. Ramesh Nallapati, Amr Ahmed, Eric Xing, William Cohen.

758. Hypergraph Spectral Learning for Multi-label Classification. Liang Sun, Shuiwang Ji, Jieping Ye.

769. Simultaneous Tensor Subspace Selection and Clustering: The Equivalence of High Order SVD and K-Means Clustering. Heng Huang, Chris Ding, Dijun Luo.

773. A Unified Approach for Schema Matching, Coreference, and Canonicalization. Michael Wick, Khashayar Rohanimanesh, Karl Schultz, Andrew Mccallum.

787. Multi-class Cost-sensitive Boosting with p-norm Loss Functions. Aurelie Lozano, Naoki Abe.

836. CCF: Combinational Collaborative Filtering for Personalized Community Recommendation. Wen-Yen Chen, Dong Zhang, Edward Chang.

850. Structured Learning for Non-Smooth Ranking Losses. Rajiv Khanna, Uma Sawant, Soumen Chakrabarti, Chiru Bhattacharyya.

========================================================
Industrial/Government Applications Track Accepted Papers
========================================================

65. Detecting Privacy Leaks Using Corpus-Based Association Rules. Richard Chow, Philippe Golle, Jessica Staddon.

80. TagMark: Reliable Estimations of RFID Tags for Business Processes. Leonardo Weiss Ferreira Chaves, Erik Buchmann, Klemens B?hm.

124. Spotting Out Emerging Artists Using Geo-Aware Analysis of P2P Query Strings. Noam Koenigstein, Yuval Shavitt, Tomer Tankel

128. Identifying Authoritative Actors in Question-Answering Forums - The Case of Yahoo! Answers. Mohamed Bouguessa, Benoit Dumoulin, Shengrui Wang.

178. Text Classification, Business Intelligence, and Interactivity: Automating C-Sat Analysis for Services Industry. Shantanu Godbole, Shourya Roy.

183. Context-Aware Query Suggestion by Mining Click-Through and Session Data. Huanhuan Cao, Daxin Jiang, Jian Pei, Enhong Chen, Hang Li.

221. Identifying Domain Expertise of Developers from Source Code. Renuka Sindhgatta.

265. Temporal Pattern Discovery for Trends and Transient Effects: Its Application to Patient Records. G. Niklas Norén, Andrew Bate, Johan Hopstadius, Kristina Star, I. Ralph Edwards.

328. Anticipating Annotations and Emerging Trends in Biomedical Literature. Fabian Moerchen, Mathaeus Dejori, Dmitryi Fradkin, Julien Etienne, Bernd Wachmann, Markus Bundschus.

330. A Visual-Analytic Toolkit for Dynamic Interaction Graphs. Xintian Yang, Sitaram Asur, Srinivasan Parthasarathy, Sameep Mehta.

337. Using Predictive Analysis to Improve Invoice-to-Cash Collection. Sai Zeng, Prem Melville, Christian Lang, Ioana Boier, Conrad Murphy.

368. Automated Cyclone Discovery and Tracking using Multiple Heterogeneous Satellite Data. Shen-Shyang Ho, Ashit Talukder.

391. Land Cover Change Detection: A Case Study. Shyam Boriah, Vipin Kumar, Michael Steinbach, Christopher Potter, Steven Klooster.

435. Extraction and Mining of Academic Social Network. Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, Zhong Su.

466. Learning Methods for Lung Tumor Markerless Gating in Image-Guided Radiotherapy. Ying Cui, Jennifer Dy, Gregory Sharp, Brian Alexander, Steve Jiang.

563. Data Mining Using High Performance Data Clouds: Experimental Studies Using Sector and Sphere. Robert Grossman, Yunhong Gu.

593. Learning from Multi-Topic Web Documents for Contextual Advertising. Yi Zhang, Arun Surendran, John Platt.

625. Heterogeneous Data Fusion for Alzheimer's Disease Study. Jieping Ye, Kewei Chen, Teresa Wu, Jing Li, Zheng Zhao, Rinkal Patel, Min Bae, Ravi Janardan, Huan Liu, Gene Alexander, Eric Reiman.

649. Scalable and Near Real-time Burst Detection from E-commerce Queries. Nish Parikh, Neel Sundaresan.

650. Privacy-Preserving Cox Regression for Survival Analysis. Shipeng Yu, Glenn Fung, Romer Rosales, Sriram Krishnan, Bharat Rao.

688. Customer Targeting Models Using Actively-Selected Web Content. Prem Melville, Saharon Rosset, Rick Lawrence.

789. Persuasive Aspects of Visualization. Chris Chih, Stott Parker.

806. Scalable Online Ad Serving: Experimental Comparison of Simple Techniques. Brendan Kitts, Gang Wu.

=========
Tutorials
=========

J. Han, J. Lee, H. Gonzalez, X. Li, "Mining Massive RFID, Trajectory, and Traffic Data Sets"

J. Neville, F. Provost, "Predictive Modeling with Social Networks"

J. Pei, M. Hua, Y. Tao, X. Lin, "Mining Uncertain and Probabilistic Data: problems, Challenges, Methods, and Applications"

H. Kriegel, P. Kroger, A. Zimek, "Detecting Clusters in Moderate-to-High Dimensional Data: Subspace Clustering, Pattern-based Clustering, and Correlation Clustering"

H. Liu and N. Agarwal, "Blogosphere: Research Issues, Applications, and Tools"

X. Yan & K. Borgwardt, "Graph Mining and Graph Kernels"

R. Feldman, L. Ungar, "Applied Text Mining"