EDAM Group Reading List

¡¡

Data Mining

 

Books:

(Berry 99) Michael J. A. Berry and Gordon Linoff: Mastering Data Mining - The Art and Science of Customer Relationship Management. John Wiley & Sons, 1999.

(Fayyad 96) Usama M. Fayyad, Gregory Piatetsky-Shapiro, Padhraic Smyth, and Ramasamy Uthurusamy: Advances in Knowledge Discovery and Data Mining. AAAI Press, 1996.

(Han 00) Jiawei Han, Micheline Kamber: Data Mining : Concepts and Techniques. Morgan Kaufmann, 2000.

(Hand 00) David J. Hand, Heikki Mannila and Padhraic Smyth: Principles of Data Mining. MIT Press, Fall 2000

(Hastie 01) Trevor Hastie, Robert Tibshirani, Jerome Friedman: The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer Verlag, 2001.

(Weiss 97) Sholom M. Weiss and Nitin Indurkhya: Predictive Data Mining: A Practical Guide. Morgan Kaufmann, 1997.

(Witten 99) Ian Witten and Eibe Frank: Data Mining, Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufman, 1999

(Cherkassky 98) V. Cherkassky and F. Mulier: Learning from Data - Concepts, Theory and Methods. Wiley, 1998

 

Decision Tree and Clustering

 (Zhang 97) Tian Zhang, Raghu Ramakrishnan, Miron Livny: BIRCH: A New Data Clustering Algorithm and Its Applications. Data Mining and Knowledge Discovery 1(2): 141-182 (1997)

(Gehrke 99) Johannes Gehrke, Venkatesh Ganti, Raghu Ramakrishnan, Wei-Yin Loh: BOAT-Optimistic Decision Tree Construction. SIGMOD Conference 1999: 169-180

(Gehrke 00) Johannes Gehrke, Raghu Ramakrishnan, Venkatesh Ganti : RainForest - A Framework for Fast Decision Tree Construction of Large Datasets. Data Mining and Knowledge Discovery 4(2/3): 127-162 (2000)

(Ganti 99) Venkatesh Ganti, Johannes Gehrke, Raghu Ramakrishnan : Mining Very Large Databases. IEEE Computer 32(8): 38-45 (1999).

(Bradley 98b) P.S. Bradley, Usama Fayyad, Cory Reina : Scaling Clustering Algorithms to Large Databases. Knowledge Discovery and Data Mining, 9-15 (1998).

(Guha 98) S. Guha, R. Rastogi and K. Shim: CURE: An efficient algorithm for clustering large databases. Proceedings of ACM-SIGMOD 1998 International Conference on Management of Data, Seattle, 1998.

(Huang 97) Zhexue Huang : A Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data Mining. Proc. SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery, 1997.  

 

SVM

(Joachims 99) T. Joachims, Making Large-Scale SVM Learning Practical. In: Advances in Kernel Methods - Support Vector Learning, B. Schölkopf, C. Burges, and A. Smola (ed.), MIT Press, 1999.

(Platt 99a) J. Platt, Fast Training of Support Vector Machines using Sequential Minimal Optimization, in Advances in Kernel Methods - Support Vector Learning,  B. Schölkopf, C. Burges, and A. Smola, eds., MIT Press, 1999.

(Platt 99b) J. Platt, Using Sparseness and Analytic QP to Speed Training of Support Vector Machines, in Advances in Neural Information Processing Systems 11, M. S. Kearns, S. A. Solla, D. A. Cohn, eds., MIT Press, (1999).

(Lee 01) Yuh-Jye Lee and O. L. Mangasarian: RSVM: Reduced Support Vector Machines. Proceedings of the SIAM International Conference on Data Mining, Chicago, April 5-7, 2001.

(Smola 00) A.J. Smola and B. Schölkopf. Sparse greedy matrix approximation for machine learning. In P. Langley, editor, Proc. ICML'00, pages 911-918, San Francisco, 2000. Morgan Kaufmann.

(DeCoste 99) Dennis DeCoste.Recent Advances in SMO Speed and Accuracy, NIPS99 Workshop on Learning with Support Vectors, December 1999.

(Fung 01b) Glenn Fung and O. L. Mangasarian: Proximal Support Vector Machine Classifiers. Proceedings KDD-2001, San Francisco August 26-29, 2001. Association for Computing Machinery, New York, 2001, 77-86.

(Musicant 99) O. L. Mangasarian and David R. Musicant: Successive Overrelaxation for Support Vector Machines. IEEE Transactions on Neural Networks, 10, 1999, 1032-1037.

(Musicant 01b) O. L. Mangasarian and David. R. Musicant: Active Set Support Vector Machine Classification. Advances in Neural Information Processing Systems 13, Todd K. Leen, Thomas G. Dietterich, and Volker Tresp, editors. MIT Press, Cambridge, MA, 2001, pages 577-583.

(Musicant 01c) O. L. Mangasarian and David. R. Musicant: Lagrangian Support Vector Machines. Journal of Machine Learning Research 1, March 2001, 161-177.

 

Market Basket : Association Rule, Frequent Pattern, Interesting Pattern, etc.

You might want to read the brief review written by Zheng Huang before you drill down to the list.

Here is a more comprehensive survey by Bart Coethals.

 (Agrawal 93) R. Agrawal, T. Imielinski, A. Swami:  Mining Associations between Sets of Items in Massive Databases, Proc. of the ACM-SIGMOD 1993 Intl Conference on Management of Data, Washington D.C., May 1993, 207-216.

(Agrawal 94) R. Agrawal, R. Srikant: "Fast Algorithms for Mining Association Rules", Proc. of the 20th Int'l Conference on Very Large Databases, Santiago, Chile, Sept. 1994. Expanded version available as IBM Research Report RJ9839, June 1994.

(Agrawal 95a) R. Agrawal, H. Mannila, R. Srikant, H. Toivonen and A. I. Verkamo: "Fast Discovery of Association Rules", Advances in Knowledge Discovery and Data Mining, Chapter 12, AAAI/MIT Press, 1995.

(Brin 97) S. Brin, R. Motwani, and C. Silverstein.  Beyond market basket: Generalizing association rules to correlations. SIGMOD 97, 265-276, Tucson, Arizona.

(Han 95) J. Han and Y. Fu. Discovery of multiple-level association rules from large databases. VLDB 95, 420-431, Zurich, Switzerland.

(Srikant 96b) R. Srikant and R. Agrawal. Mining quantitative association rules in large relational tables. SIGMOD 96, 1-12, Montreal, Canada.

(Imielinski 00) T. Imielinski, L. Khachiyan, and A. Abdulghani. Cubegrades: Generalizing association rules. Technical Report, Aug. 2000.

(Korn 98) F. Korn, A. Labrinidis, Y. Kotidis, and C. Faloutsos. Ratio rules: A new paradigm for fast, quantifiable data mining. VLDB 98.

(Klemettinen 94) M. Klemettinen, H. Mannila, P. Ronkainen, H. Toivonen, and A.I. Verkamo. Finding interesting rules from large sets of discovered association rules. CIKM 94, 401-408, Gaithersburg, Maryland.

(Ozden 98) B. Ozden, S. Ramaswamy, and A. Silberschatz. Cyclic association rules. ICDE'98, 412-421, Orlando, FL.S.

(Tsur 98) D. Tsur, J. D. Ullman, S. Abitboul, C. Clifton, R. Motwani, and S. Nestorov. Query flocks:  A generalization of association-rule mining. SIGMOD 98, 1-12, Seattle, Washington.

(Zaki 00) M. Zaki.  Generating Non-Redundant Association Rules.  KDD 00.  Boston, MA.  Aug. 2000.

 (Agrawal 95b) Rakesh Agrawal and Ramakrishnan Srikant. Mining Sequential Patterns. In Proc. of the 11th Int'l Conference on Data Engineering, Taipei, Taiwan, March 1995.

(Srikant 96) Srikant, R., & Agrawal, R.: Mining sequential patterns: Generalizations and performance improvements, Proc. of the Fifth Int'l Conference on Extending Database Technology (EDBT). Avignon, France, 1996.

(Joshi 00) A. Joshi and R. Krishnapuram. On Mining Web Access Logs. In Proceedings of the 2000 ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery 2000, pp. 63-69, 2000.

(Tung 99) A. K. H. Tung, H. Lu, J. Han, and L. Feng, "Breaking the Barrier of Transactions: Mining Inter-Transaction Association Rules", Proc. 1999 Int. Conf. on Knowledge Discovery and Data Mining (KDD 99), San Diego, CA, Aug. 1999, pp. 297-301.

(Lu 00) H. Lu, L. Feng, and J. Han, "Beyond Intra-Transaction Association Analysis:Mining Multi-Dimensional Inter-Transaction Association Rules", ACM Transactions on Information Systems (TOIS¡¯00), 18(4): 423-454, 2000.

 (Brin 97a) S. Brin R. Motwani, J. Ullman, and S. Tsur. Dynamic itemset counting and implication rules for market basket data. In SIGMOD 97.

(Cheung 96) D.W. Cheung, J. Han, V. Ng, and C.Y. Wong. Maintenance of discovered association rules in large databases: An incremental updating technique. ICDE 96, New Orleans, LA.

(Fukuda 96) T. Fukuda, Y. Morimoto, S. Morishita, and T. Tokuyama. Data mining using two-dimensional optimized association rules: Scheme, algorithms, and visualization. SIGMOD 96, Montreal, Canada.

(Ganti 99) Venkatesh Ganti, Johannes Gehrke, Raghu Ramakrishnan : Mining Very Large Databases. IEEE Computer 32(8): 38-45 (1999).

(Han 97) E.-H. Han, G. Karypis, and V. Kumar. Scalable parallel data mining for association rules. SIGMOD 97, Tucson, Arizona.

(Mannila 94) Heikki Mannila, Hannu Toivonen, A. Inkeri Verkamo: Efficient Algorithms for Discovering Association Rules. KDD Workshop 1994: 181-192.

(Miller 97) R.J. Miller and Y. Yang.  Association rules over interval data.  SIGMOD 97, 452-461, Tucson, Arizona.

(Park 95) J. Park, M. Chen, and P. Yu. An effective hash-based algorithm for mining association rules. In SIGMOD 95.

(Pasquier 98) N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Discovering frequent closed itemsets for association rules. ICDT 99, 398-416, Jerusalem, Israel, Jan. 1999.

(Sarawagi 98) S. Sarawagi, S. Thomas, and R. Agrawal. Integrating association rule mining with relational database systems: Alternatives and implications. In SIGMOD 98.

(Savasere 95) Ashok Savasere, Edward Omiecinsky, and Shamkant Navathe. An efficient algorithm for mining association rules in large databases. In 21st Int'l Conf. on Very Large Databases (VLDB), Zurich, Switzerland, Sept. 1995.

(Savasere 98) Savasere A., Omiecinski E., and Navathe S. B. "Mining for Strong Negative Associations in a Large Database of Customer Transactions." Proceedings of the International Conference on Data Engineering, February 1998.

(Silverstein 98) C. Silverstein, S. Brin, R. Motwani, and J. Ullman. Scalable techniques for mining causal structures.  VLDB'98, 594-605, New York, NY.

(Toivonen 96) H. Toivonen. Sampling large databases for association rules. In VLDB 96.

(Yoda 97) K. Yoda, T. Fukuda, Y. Morimoto, S. Morishita, and T. Tokuyama. Computing optimized rectilinear regions for association rules. KDD 97, Newport Beach, CA, Aug. 1997.

(Zaki 97) M. J. Zaki, S. Parthasarathy, M. Ogihara, and W. Li. Parallel algorithm for discovery of association rules. Data Mining and Knowledge Discovery, 1:343-374, 1997.

(Zaki 99) M. Zaki. CHARM: An Efficient Algorithm for Closed Association Rule Mining, CS-TR99-10, Rensselaer Polytechnic Institute.

(Zaki 01) Fast Vertical Mining Using Diffsets, TR01-1, Department of Computer Science, Rensselaer Polytechnic Institute.

 (Bayardo 98) R. J. Bayardo. Efficiently mining long patterns from databases. SIGMOD 98, 85-93, Seattle, Washington.

(Srikant 96) Srikant, R., & Agrawal, R.: Mining sequential patterns: Generalizations and performance improvements, Proc. of the Fifth Int'l Conference on Extending Database Technology (EDBT). Avignon, France, 1996

(Yang 02) Mining long sequential patterns in a noisy environment, by Jiong Yang, Wei Wang, Philip Yu, and Jiawei Han, Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD), pp. 406-417, 2002.

(Ayres 02) Jay Ayres, J. E. Gehrke, Tomi Yiu, and Jason Flannick. Sequential PAttern Mining Using Bitmaps. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Edmonton, Alberta, Canada, July 2002.

(Shintani 98) T. Shintani and M. Kitsuregawa. Mining algorithms for sequential patterns in parallel : Hash based approach. Second Pacific--Asia Conference on Knowledge Discovery and Data mining, April 1998.

(Pei 01) J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal and M-C. Hsu. PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth. In. Proc. 2001 Int. Conf. Data Engineering (ICDE'01), pages 215-224, Heidelberg, Germany, April 2001.

(Zaki 98) M.J. Zaki. Efficient enumeration of frequent sequences. CIKM 98. Novermber 1998.

(Zaki 01a) M. Zaki. SPADE: An efficient algorithm for mining frequent sequences. Machine Learning, 42(1/2):31-60, 2001.

 

 

Time-series analysis

Books:

(Box 94) G. Box, G. Jenkins, and G. Reinsel. Time Series Analysis: Forecasting and Control. Prentice Hall, Englewood Cliffs, NJ, 1994. 3rd Edition.

(Weigend 94) A. Weigend and N. Gerschenfeld. Time Series Prediction: Forecasting the Future and Understanding the Past. Addison Wesley, 1994.

¡¡

Papers:

(Das 98) G. Das, K.-I. Lin, H. Manilla, G. Renganathan, and P. Smyth: Rule Discovery from Time Series. In Proc. of KDD '98, Aug 1998.

(Faloutsos 94) C. Faloutsos, M. Ranganthan, and Y. Manolopoulos. Fast subsequence matching in time-series databases. Proc. ACM SIGMOD, 419-429, May 1994.

(Faloutsos 97) Faloutsos , C., Jagadish, H., Mendelzon, A. & Milo, T. (1997). A signature technique for similarity-based queries. In proceedings of the Int'l Conference on Compression and Complexity of Sequences. Positano-Salerno, Italy, Jun 11-13.

(Keogh 01) Keogh, E., Chu , S., Hart, D. and Pazzani, M: An Online Algorithm for Segmenting Time Series. In Proceedings of IEEE International Conference on Data Mining. pp 289-296, 2001.

(Agrawal 93b) Agrawal , R., Faloutsos, C. & Swami, A. (1993). Efficient similarity search in sequence databases. In proceedings of the 4th Int'l Conference on Foundations of Data Organization and Algorithms. Chicago, IL, Oct 13-15. pp 69-84.

(Agrawal 95c) Agrawal , R., Lin, K. I., Sawhney, H. S. & Shim, K. (1995). Fast similarity search in the presence of noise, scaling, and translation in time-series databases. In proceedings of the 21st Int'l Conference on Very Large Databases. Zurich , Switzerland , Sept. pp 490-50.

(Chan 99) Chan, K. & Fu, A. W. (1999). Efficient time series matching by wavelets. In proceedings of the 15th IEEE Int'l Conference on Data Engineering. Sydney, Australia, Mar 23-26. pp 126-133.

(Chu 99) Chu , K. & Wong, M. (1999). Fast time-series searching with scaling and shifting. In proceedings of the 18th ACM Symposium on Principles of Database Systems. Philadelphia, PA, May 31-Jun 2. pp 237-248.

(Loh 00) Loh , W., Kim, S. & Whang, K. (2000). Index interpolation: an approach to subsequence matching supporting normalization transform in time-series databases. In proceedings of the 9th ACM CIKM Int'l Conference on Information and Knowledge Management. McLean, VA, Nov 6-11. pp 480-487.

(Popivanov 02) Popivanov, I. & Miller, R. J. Similarity search over time series data using wavelets. In proceedings of the 18th Int'l Conference on Data Engineering. San Jose, CA, Feb 26-Mar 1.

(Rafiei 98) Rafiei , D. & Mendelzon, A. O .  Efficient retrieval of similar time sequences using dft . In proceedings of the 5th Int'l Conference on Foundations of Data Organization and Algorithms. Kobe , Japan , Nov 12-13.

(Rafiei 99) Rafiei, D. On similarity-based queries for time series data. In proceedings of the 15th IEEE Int'l Conference on Data Engineering. Sydney, Australia, Mar 23-26. pp 410-417.

(Wu 00) Wu, Y., Agrawal, D. & El Abbadi, A. (2000). A comparison of DFT and DWT based similarity search in time-series databases. In proceedings of the 9th ACM CIKM Int'l Conference on Information and Knowledge Management. McLean, VA, Nov 6-11. pp 488-495.

 

Mining Evolving Data and Streams

(Ester 98) M. Ester, H.-P. Kriegel, J. Sander, M. Wimmer, X.Xu: Incremental Clustering for Mining in a Data Warehousing Environment. Proc. 24th Int. Conf. on Very Large Data Bases, New York, 1998, pp. 323-333.

(Ganti 00) Venkatesh Ganti, Johannes Gehrke, Raghu Ramakrishnan. DEMON: Mining and Monitoring Evolving Data., in ICDE 2000: 439-448, San Diego, CA.

(Yi 00) B.-K. Yi, N. D. Sidiropoulos, T. Johnson, A. Biliris, H. V. Jagadish and C. Faloutsos . Online Data Mining for Co-Evolving Time Sequences. In Proceedings of the IEEE Sixteenth International Conference on Data Engineering, pages 13--22, 2000.

(Veloso 02) A. Veloso, W. Meira, M. Carvalho, B. Possas, S. Parthasarathy and M.Zaki, Efficiently mining approximate models of associations in Evolving Databases , to appear in ECML/PKDD 2002.  

(Cortes 00) C. Cortes, K. Fisher, D. Pregibon, and A. Rogers . Hancock: A Language for Extracting Signatures from Data Streams. In Proceedings of the Association for Computing Machinery Sixth International Conference on Knowledge Discovery and Data Mining, pages 9-17, 2000.

(Lee 98) Wenke Lee and Sal Stolfo. `` Data Mining Approaches for Intrusion Detection'' In Proceedings of the Seventh USENIX Security Symposium (SECURITY '98), San Antonio, TX, January 1998.


 Trajectory Data Mining and Management


(WOL02)Ouri Wolfson,  Moving Objects Information Management: The Database Challenge   In Proc. of the 5th Workshop on Next Generation Information Technologies and Systems ,2002


 (GBEJ00) Ralf Hartmut Guting, Michael H. Bohlen, Martin Erwig and Christian S. Jensen and Nikos A. Lorentzos,   A Foundation for Representing and Querying Moving Objects, In ACM Transactions on Database Systems,2000


(TP02)Yufei Tao and Dimitris Papadias,Time-Parameterized Queries in Spatio-Temporal Databases", In Proc. of ACM SIGMOD, 2002
¡¡

(SS03) Shashi Shekhar and Sanjay Chawla, Spatial Databases: A Tour , Prentice Hall, 2003 (ISBN 013-017480-7).   Chapter 7.:  Introduction to Spatial Data Mining.

(SJLL00) Simonas Saltenis,Christian S. Jensen, Scott, T. Leutenegger and Mario A. Lopez,  Indexing the Positions of Continuously Moving Objects",2000
¡¡

(TSPM98) Yannis Theodoridis,Timos K. Sellis and Apostolos Papadopoulos and Yannis Manolopoulos,  Specifications for Efficient Indexing in Spatiotemporal Databases, In Proceeding of 10th International Conference on Scientific and Statistical Database Management, 1998
¡¡

(VKG02) M. Vlachos,G. Kollios,D. Gunopulos,  Discovering similar multidimensional trajectories, In Proc. of ICDE,2002


Constrains on Association rules

Mining incorporate prior/post knowledge
Subset Mining