We are very glad to announce three releases on fuzzy string matching.
1. Flamingo Source-Code Package (version 3.0) on approximate string
matching
http://flamingo.ics.uci.edu/releases/3.0/
URLs of earlier DBWorld messages:
Version 1.0: http://www.cs.wisc.edu/dbworld/messages/2007-04/1176855447.html
Version 2.0: http://www.cs.wisc.edu/dbworld/messages/2008-10/1224008939.html
Main changes in this version:
* Added Compressed Indexers based on the Techniques from:
"Space-Constrained Gram-Based Indexing for Efficient Approximate
String Search", by Alexander Behm, Shengyue Ji, Chen Li, and
Jiaheng Lu, in ICDE 2009
* Added Module for Top-K Approximate String Search from: "Efficient
top-k algorithms for fuzzy search in string collections", by Rares
Vernica, Chen Li, in KEYS 2009: 9-14. (Workshop on Keyword Search
on Structured Data, collocated with SIGMOD 2009)
* Added Disk-Based Inverted Index, Disk-Based StringContainer and
Efficient Search Algorithms using the Disk-Based Components from:
"Answering Set-Similarity Selection Queries on Large Disk-Resident
Data Sets", by Alexander Behm, Chen Li, Michael J. Carey, UCI
Technical Report 2010
* Added Some Auto-Tuning Features, e.g. Automatic Choice of
Partitioning Filter
Main contributors in this new release:
Alexander Behm, Rares Vernica, Shengyue Ji, and Chen Li,
2. Source code for Parallel Set-Similarity Joins Using MapReduce
http://asterix.ics.uci.edu/fuzzyjoin-mapreduce/
Its techniques are described in the SIGMOD 2010 paper titled:
"Efficient Parallel Set-Similarity Joins Using MapReduce", by Rares
Vernica, Michael J. Carey, Chen Li.
3. Demos on Fuzzy Keyword Search on Spatial Data (Maps)
http://flamingo.ics.uci.edu/localsearch/fuzzysearch/
Its techniques are described in the DASFAA 2010 demo paper titled
"Fuzzy Keyword Search on Spatial Data", by Sattam Alsubaiee and
Chen Li.
Chen Li
UC Irvine