The Sciences are experiencing an unprecedented explosion in the amount of available data. Traditional data analysis methods cannot deal with this increased quantity of data. There is an urgent need to automate the process of refining scientific data into scientific knowledge. Inductive logic programming (ILP) is a data analysis framework well suited for this task. We have applied ILP to analyse data in a large and complex bioinformatic database. We learnt rules which can accurately predict the functional class of a protein from its sequence. This method has the advantage over conventional approaches of working even in the absence of homologous proteins of known function. For the ORFs of unassigned function in the E. coli genome: the method predicts ~40% at the most coarse functional classes with an estimated accuracy of ~60%, and ~10% at the most detailed functional classes with an estimated accuracy of ~70%.