B. External interface
From usersU point of view, the GUI itself is the interface. Figure 2 is a screen capture of the Minirel's main GUI. Users inputs by typing over the keyboard or drags through the menus. From the Minirel DBMS's point of view, it is not aware of any external interface provided by the GUI. This is because the Minirel process is indeed run as a X window application . The internal event loop within the application detects the user actions and invoke the associated commands. Minirel DBMS plays a passive role. Even for the utilities, the DBMS only takes SQL statements and processes them. Never in a case any GUI modules are being invoked by the Minirel DBMS .
Figure 2. Main GUI
Internally, each individual component of the frontend system does have its own public interface. Notably, each set of GUI and the integration layer do provide a fairly complete set of public interfaces. The following sections attempts to go through each of them:
The design of Minirel's GUI evolves as Tcl/Tk skill matures and application environment changes. Eventually, it has been decided that minimal amount of information should be displayed to avoid user distraction, and to simplify all the synchronization between display information and internal representation of information, i.e. name of the database in use, or name of the relation opened. Altogether, there are four major GUI's, main GUI, browser GUI, utility GUI, and QBE GUI, designed and implemented for the corresponding components of the frontend system.
The layout of the main window's GUI is very straight forward. Referring back to Figure 2, our screen capture of the GUI, the main window is labeled TMinirel Query Manager.U Buttons have self-explanatory labels on them. Along the top of the main window there is a menu bar with three menu options: TFileU,UTool,U and TUtility.U Each of these menu options represent a functional group of the frontend system. Opening and closing of a database are commands universally placed under TFile.U Query editor and condition box are tools for doing query, thus the commands for enabling them are under TTool.U TUtilitiesU submenu gives access to all the database utilities that Minirel supports, including creating a database, deleting a database, creating a relation, deleting a relation, adding an index, dropping an index, inserting a record, deleting a record, and bulkloading. Other than the buttons and the menu bar, there is another label indicating the current active database, and a log window displaying the messages generated by Minirel as a result of any user issued commands.
Browser GUI presents to the users all available catalog information of a database. A sample state of browser GUI is shown in Figure 3. It contains three major components, a database listbox, a relation listbox, and an attribute listbox. Depending on user selections, each listbox dynamically updates its list of entries. The small index-type menu bar at the bottom left of the browser allows the user to tell what is the index type built for a specific attribute.
Utility GUI basically builds on top of listbox components of the browser. Instead of running as a tool for browsing catalog information, as is so in the browser, listboxes are used as tools for input to specify target for their utility operation, e.g. which database to delete when calling dbdelete. Customized utility GUI is provided for each specific utilities as needed. Figure 4 is a sample of the customized utility GUI when adding a relation.
Figure 3. Browser GUI
Figure 4. Utility GUI for adding a relation
Simple QBE GUI is provided to the query manager for entering QBE queries. Largely following the specification of QMF, in each relation table there are the standard command column, attribute columns, optional condition box, and optional unnamed columns. Less obvious features would be arbitrary number of row entries and unnamed columns. In addition to the mentioned must-have QBE features, there are also enhancements over the original QBE. For instance, users can now configure their table skeletons through point and click, and arbitrary attributes in any relation tables can be turned on or off. Most importantly, we support a query editor. Before the SQL statement is submitted to the DBMS, it can be edited manually in anyway. Figure 5 is shows a sample of QBE GUI with condition box and query editor enabled. The details on the merit of this query editor will be discussed later.
Figure 5. QBE GUI
Aside from the GUI, frontend system work also includes building of an integration layer for Tcl/Tk/C++, and database utilities. Though they only require a moderate amount of implementation and not much room for design, for the sake of completeness, we will briefly describe in general the design of all of them.
Integration of Tcl/Tk/C++ code was fairly straight forward, once the template file was located. A set of initialization routines were written to register new Tcl commands to the Tcl interpreter. A new class Browser is built to manage the catalogs in order to handle the necessary information request from the browser, or specifically from each of the listboxes. In short, there is a set of new Tcl commands built for the routines in class Browser so that the Tcl scripts can use them. Figure 6 and 7 are the header file used for the class Browser and the header file for declaring the new Tcl routines, respectively. It is very similar to the class maincatalog. At time of implementation, class maincatalog was not designed yet, thus class Browser was implemented to provide a similar set of functionalities.
There are two main routines for integrating Minirel DBMS and GUI layer into the extended Tcl/Tk libraries, which contain all necessary commands to interpret the Tcl/Tk scripts. These two routines, Init_Minirel() and Init_GUI(), are defined in gui.h, as shown in Figure 7. They contain the necessary routines to initialize the entire DBMS and frontend, respectively. Note that Init_Minirel() is used by database utilities as well because Minirel database utilities run on separate UNIX address space.
As mentioned, Minirel database utilities are standalone UNIX executables. They can be viewed as a special instance of the Minirel DBMS, executing on one single command only. They are needed because current implementation of parser and planner do not yet support DDL commands. They can run from UNIX shell or the Minirel frontend system. There is really no external interface from these utilities. Syntax for using these utilities are in the appendix section.
# include "maincatalog.h"
# include "util.h"
# include "db.h"
#ifndef BROWSER_H
#define BROWSER_H
#define WORKINGDIR "/afs/cs.wisc.edu/u/grad/dmw/cs784/proj"
// Browser class only exists as an instance of a utility operation
class Browser {
private:
// internal bookkeeping
// char DBname[MAXLENGTH];
// char RELname[MAXLENGTH];
// char ATTRname[MAXLENGTH];
// some state info.
Error error;
DB* OpenedDB ;
RelCatalog* OpenedRelCat ;
AttrCatalog* OpenedAttrCat ;
IndexCatalog* OpenedIndexCat ;
public:
// Open DB, RelCat, and Attr Cat
Browser();
// clean up all opened stuffs, if any
~Browser();
// return list of databases available
Status GetDBList(char* dblist);
// return list of relations in the opened db
Status GetDBInfo(char* dbname, char** rellist, int &dbcnt, int MAXCOUNT);
// return list of attr. in the opened relation
Status GetRELInfo(char* attrname, char** attrlist, int &relcnt, int MAXCOUNT);
// return specific info on an attr.
Status GetATTRInfo(char* relname, char* attrname,int &cnt,IndexType *indexrec);
// prints record in an opened relation
Status Print();
// return pointer to current DB, relations, attributes, respectively
DB* CurrentDB() { return OpenedDB;}
RelCatalog* CurrentREL() { return OpenedRelCat;}
AttrCatalog* CurrentATTR() { return OpenedAttrCat; }
IndexCatalog* CurrentINDEX() { return OpenedIndexCat;}
Status ResetStates();
};
#endif BROWSER_H
Figure 6. Class Browser in browser.h
#include "tclExtend.h"
#include "catalog.h"
#include "util.h"
#include "browser.h"
#include "db.h"
#include "error.h"
#ifndef GUI_H
#define GUI_H
// initialize browser
int
Init_Browser(ClientData clientdata, Tcl_Interp *interp,int argc,char* argv[]);
// clean up browser
int
Close_Browser(ClientData clientdata, Tcl_Interp *interp,int argc,char* argv[]);
// set of tclcreate_cmd extensions to tcl
int MR_CreateCmd(Tcl_Interp* interp) ;
// calling misc functions to init GUI
int Init_GUI(Tcl_Interp* interp) ;
// init all managers
Status Init_Minirel();
// open the db
int DBopen(ClientData clientdata, Tcl_Interp *interp, int argc,char* argv[]);
// close the db
int DBclose(ClientData clientdata, Tcl_Interp *interp, int argc,char* argv[]);
int ExecuteQuery(ClientData clientdata, Tcl_Interp *interp,
int argc,char* argv[]);
int getRELinfo(ClientData clientdata, Tcl_Interp *interp,
int argc,char* argv[]);
int getDBinfo(ClientData clientdata, Tcl_Interp *interp,
int argc,char* argv[]);
int getATTRinfo(ClientData clientdata, Tcl_Interp *interp,
int argc,char* argv[]);
// get relation info without browser
int QRELinfo(ClientData clientdata, Tcl_Interp *interp,int argc,char* argv[]);
// define some global variables
#define GLOBAL_DB db
#define GLOBAL_ERROR error
#define GLOBAL_RELCAT relCat
#define GLOBAL_ATTRCAT attrCat
#define GLOBAL_INDEXCAT indCat
#endif GUI_H
Figure 7. New Tcl commands in gui.h
Translator data structure
Since all data structures in Tcl are represented in terms of lists, the data structure of the translation algorithm is in terms of lists too, e.g. information is passed around in lists, and are kept in lists. For simplicity, there is no attempt to derive a generic data structure for the translator. Each query command basically has its own data structure and data format. Use of common data structure is limited to the communication layer between base and relation tables, and the basic variable replacement mechanism, which will be described in later sections.
In QBE, each box is an entry. NULL entries are kept for completeness , but they do not have any effect on the generated query because the current parser can not extract any useful information from them. Each row in a relation table is associated with a unique alias of the relation. For example, all attributes in row#1 is associated with relation A1 while all attributes in row#2 is associated with relation A2, if the rows are in the relation table for relation A. In terms of logic programming, each row can be viewed as a tuple variable while variables within each tuple variable are subgoals. The mission of the parser is to resolve these subgoals by matching them with an attribute. The rationale is, one of the instances of the variable must be simply associating itself with an attribute. Identifying such an instance would allow us to replace all appearances of the variable by its associated attribute. Once constrains are all bound to constants or attributes, it is just a matter of extracting the relation information, attribute list, and some embedded commands ( i.e. aggregate commands) from the data and form the SQL query.
Translation algorithm
This section contains an in-depth treatment of how DML queries are parsed. Outline of the translation algorithms for other commands, together with the reference on specific data format used can both be found in appendix section.
The parser consists of a small set of generic routines, modules that are commonly used, and customized routines, modules that are called by a specific query command only. Figure 11 and Figure 12 detail the dataflow of how different query command modules processes the entry information. DUe to space limit, we will only discuss translation the query command P., which actually represents the entire set of DML, including selection, projection, union, negation, and join, here.
The parsing stage begins when the user requests the generation of a query at the base. The base responses by broadcasting a request to every opened relation table to send entry information. Different components go through the following different steps.
Local parsing at relation tables
Here is part of the local parser algorithm, as shown in Figure 13:
0) Read the command column to check for P., if found, it is a special case to project all attributes in a relation table, format the entry as if a P. is placed at every column
1) Read the command column to check for I. or D., if found, call corresponding GetInsertEntryCmd/GetDeleteEntryCmd, and send the result to the base
2) If there is no info.in command column, read in all the available attribute column entries pack them in the form of pair for each entry, empty ones have ;
3) check for U. in the list of pairs,
if found, call correspond GetUpdateEntryCmd, send result to base,
If not found, just pack the entry info with command UNKNOWN. to the base, and let it decide
Note: The brief outline of the algorithm used in each specialized GetXXXEntryCmd is in the appendix section. Further detailed documentation will be on line with the source code.
Central processing at the base:
The base is basically the main translator. Upon user request, the base begins the query generation process by proceeding through the following stages:
1) Request stage:
The base gathers entry information from all opened relation table by sending each active table a send data request, as described above.
2) Receive stage:
The base arranges the received information by placing the ones with commands to the beginning of the list. As a result, entry lists with UNKNOWN. as commands are placed at the end. This is needed because entries list with command UNKNOWN. do not contain commands, These entries only help in forming the queries. The stage ends when all local parsers have responded with their respectively entry info.
3) Parse stage:
After collecting all the commands from all opened relation tables, the base processes the command list by checking if there is incompatible commands , then determine the command to be performed and invokes the corresponding ProcessXXXCmd routine with the entry lists gathered.
4) Translate stage:
Depending on the query command, this lengthy stage can be very different, let's examine the processing of DML queries in detail. Notice that in case that other query command is found, the corresponding GetXXXEntryCmd will be invoked instead.
Mechanism for translating DML queries
ProcessProjCmd - processing DML queries
This routine begins by checking if the DML requires an union. This can be determined by checking if command P. is found at more than one row in a relation table. If so, the answer is yes, no otherwise.
If a union is not required, then it proceeds through the following steps:
1) Do the variable substitutions,
2) Process any negation commands,
3) Extract projection attributes,
4) Extract selection constrains,
5) Extract condition box entries,
6) Extract aliases of relation used for negation and projection,
7) Perform string manipulation to put the information together to form a valid SQL statement
Among the above steps, variable substitutions and processing of negation commands might need further explanation. Figure 11 contains a detail dataflow of ProcessProjCmd. Basically, variable substitution is based on the observation that in a valid QBE, for each variable, there must be a single entry that associates itself with a specific attribute. Following the assumption that we will have a correct input, all the pairs are scanned once to build the hash table to associate each variable with an attribute using the above characteristic. Then a second scan would do the replacement of each occurrence of the variables with its associated entry in the string table. Processing of negation requires a little more work. A subquery is generated using the same translation technique as generating a projection/selection query, except that we assume no nested negation nor union within this subquery, thus avoiding recursive invoking of modules. This step ends with the formation of a not exists subclause.
In the case that a union is required, for each row with a P. command in it, the base would discard other rowsU projection information, perform the above parsing steps, and join the result of each parsing together by inserting a keyword TunionU in between.
The projection algorithm described above can be appropriately extended to handle insert, delete and update commands. Interested readers are advised to read the appendix section for details on processing these query commands. Figure 12 also illustrates the dataflow schema of these commands.
Implementation status
The following is a summary of the features that current implementation of the parser support/not support:
Supported features
1) P. with any combination of constants, variables, unions, negation, using unnamed columns,and/or condition box entries.
2) I.D.U. multi-statement, constants and/or variables, limited form of subquery, as mentioned above.
3) G., group by clause, a sample implementation for a simple extension of the parser to handle aggregate commands and qualifiers.
4) limited error checking
Not supported features:
1) aggregate functions - i.e. SUM. AVE. MIN. MAX. CNT.
2) qualifiers - i.e. UN. ALL. AO. DO.
3) union and negation in subqueries of insert, delete, and update queries
4) queries that require nested select's
5) total error detection
Comments
Before waving our hands to these neglected/incomplete features in QBE, let us constructively suggest how these features can be extended into the current implementation of the parser.
1) aggregate functions - can extend the parser by adding a preprocessing stage to replace these key words with corresponding SQL keywords and put a bracket enclosing the next item in the entry
2) qualifiers - can extend the parser by setting various flags in the projection processing routines to generate an additional keyword in the SQL statement, since basically quantifiers are simply additional keywords in the SQL statements at a predefined location
3) union and negation in insert, delete, and update subqueries - require some modification of structure as mentioned in the section above
4) nested select's - we are not aware of any valid QBE queries that require nested select's in corresponding SQL statements yet, but we are not ready to disprove it either
5) total error detection - it basically requires ad-hoc ways to check every entry and then conflicting entries. Additionally, each entry input must be processed. Due to time constrain, this feature is incomplete. This is also not the main purpose of the project. We will leave this to future work section, if desired.
Obviously, adding each additional feature is very easy tasks, since all necessary tools are in placed, just the bookkeeping of new status variables must be carefully maintained.
Retrospective
Current implementation of the parser is heavily influenced by continuous evolvement of design decisions. Essentially, the overall structure is a result of continuous extensions, or generalization of simple string manipulation tools implemented. Thus a brief history of the work done would be very helpful in understanding the motivation of the parser structure.
The project started in mid February, when design proposal was to be drafted. After some planning and background readings, a proposal was turned in. The original proposal included a data structure in C, intended for building the hash table for the variable substitution stage. As soon as the author has realized there is a build-in hash table in Tcl, it has been decided that the entire project is to be done in Tcl, with the integration to C code for generating test data. A code review of Vivek's GUI for Garlic was completed first in order to check if there is any special technique that would aid in parsing queries. Unfortunately, the Garlic project has a different priority of implementation. Therefore the essential routine, variable substitution for join operations, in QBE was actually not supported at the time the code review was done. The tabular form of GUI in QBE was distinct from the multimedia interface Garlic provides. Thus work on the GUI is started from scratch.
Since early March, there is intense programming to get the code base for QBE GUI completed in order to start working on the translator. Later on, part of this GUI was actually extended to be Minirel's frontend GUI. At the late March, most GUI and communication protocol, after a little trouble , was completed. The first goal of the project was to parse simple select's with only constants as entry input, then simple projects and joins. This is when the variable substitution module was built, and it required minimal modification from then on, because the routine is a little complicated and with many details in it to take care of special cases such as empty entries, end of row markers, etc.
The initial GUI looked a like the GUI provided by Paradox , in the sense that projection commands are issued by clicking checkboxes at the corresponding attribute column. But the author later learned that straight QBE syntax were to be followed. That changed most of the modules implemented, except variable substitution. As soon as simple DML queries were tested, the attention of work was switched to parsing inserts, deletes and updates for entries with constants only. The DDL query commands require similar parsing techniques, except they have distinct SQL statements as output. Modules are largely borrowed from the ones implemented from the DML queries.
The second round of implementation involves doing DDL with subqueries. Unfortunately, codes are copied from DML routines, rather then called by those DDL routines. This is where it starts getting ugly. As a result of developing parsing routines for each query command independently, the code base for doing similar routines started to diverge, and each set of modules were making different assumptions on the data format. It often required a moderate amount of changes to modules whenever a new feature is added somewhere. This is the reason DML supports negation and union now, but not those routines for DDL queries. The final round of implementation involves handling union and negation for DML queries, and multiple SQL statements for DDL commands. Current source code, as a result of stages of modification, also contains code for routines that were previously used.
Fortunately, in spite of the fall back, the code base still appears very modular. In fact, this design approach has its strength too. Future people working on the code will be able to work on a specific command at a time. It is indeed very hard to identify the set of routines that all query commands would share the same pattern.
The main weakness of the parser is the error detection. Since new features are being added and added, there is no reason to do error checking before the entire parser, along with the all the planned feature is completed . Also, size of source code suffers from using slowing evolving modules. A better overall design can possibly simplify the application development. For example, in some cases, such as generating subqueries in insert and delete, the DDL modules should have simply been able to call a more general version of ProcessProjCmd to form the subquery, instead of developing sets of smaller modules for each query command to handle it. It is not possible now because the many minor syntax difference of these DDL queries in SQL make it impractical to try to build a module general enough to build subquery for every query command.
Accomplishments
The project is all about months of intense and focused implementation work, from the time the proposal was turned in, to finalizing the GUI and the QBE translator. At each stage of the project we improve upon our old design and attempt to attain a better looking GUI and more versatile QBE to SQL translator. After every change of the project status we attempt to do the integration. Indeed, this was when we found out some of the name conflicts and type conflicts. Every best effort was made to make sure the frontend system works along with other groups from the begin to end. We would just let the completed work speak for the amount of work we have done. Starting from scratch ( Vivek's code was thought to be structurally too far off from the QBE's GUI design ), and with a set of Tk library routines, the project now, with documentation, is altogether ~7k lines. For comparison, we are aware that 2 years ago, in Prof. Yannis's CS784 there was a QBE project that required a 3 person team to support a subset of QBE. Then again they were writing the entire application in C. We want to emphasize the hard effort and convenient programming tools we have.
Testing
Testing for QBE to SQL translator and Minirel frontend GUI were done separately. Sample QBE test queries are drawn from the two QBE references. These sample test queries serve to test most forms of QBE queries. Additional test cases are also generated . Regarding the Minirel frontend, several dummy layers, i.e. buffer manager, heap file and catalog layers, have been built to test the integration with the Tcl code. Indeed, only the browser requires a fair amount of testing with the GUI because all utilities are built as command line UNIX tools and are thus very simple to write and debug using gdb.
Retrospective
The project is undoubted an excellent team work experience with other groups. We have suggested some ways to possibly improve the project - basically, to enforce more checkpoints and to come up with better design approach.
To the translator, more generalization could have been done to save some work, by observing that that the constant case of DDL can be a special case of select distinct query. By making all DDL contain subqueries, we could have save the work to implement another 3 sets of dataflow modules for the special case of queries that only contain constants as inputs.
Reference
-Introduction to Database System - QBE and QBF
-Introduction to Tcl/Tk
-QBE manual, IBM QMS Reference
-Jim Melton and Alan R. Simon, Understanding the new SQL
-PESTO, the GUI for Garlic
Appendix
%Utilities command references
- dbcreate [database name]
- dbdelete [database name]
- relcreate [database name] [relation name] [attribute datatype size] ...
datatype = string, int, or real
- reldelete [database name] [relation name]
- indexcreate [database name] [relation name] [attribute indextype size]..
indextype = hash or b_tree
- recinsert [database name] [relation name] [attribute value] ...
- recdelete [database name] [relation name] [attribute value]
- bulkload [database name] [relation name] [file name]
%Format of
attribute = (relation name(N)).(attribute name) N = row#
value = raw entry in the corresponding box
%Format of :
- P., or {} found:
[cmd, [ [attribute, values] ] ]
cmd = P. or UNKNOWN.
- I., D., or U. found
[insert, list of [insertcmd,list of [attribute values] pairs]
insertcmd = insertc, or insertv
deletecmd = deletec, or deletev
updatecmd = updatec, or updatev
%Outline of GetXXXEntryCmd [ XXX denotes command found ]
- GetInsertEntryCmd
Read the entries row by row
Determine the type of insert by scanning each row to see if there is any variable involved:
1) If not: tagged insertc to the data list, send to base
2) If yes, tagged insertv to the data list, send to base
The result of each row is appended together and sent to the base
- GetDeleteEntryCmd
Determine the type of delete by scanning each row to see if there is any variable involved:
1) If not: tagged deletec to the data list, send to base
2) If yes, tagged deletev to the data list, send to base
The result of each row is appended together and sent to the base
- GetUpdateEntryCmd
Similar concept to GetInsertEntryCmd and GetDeleteEntryCmd, except that it must identify the attribute to be updated, and determine the type of update by scanning each row to see if there is any variable involved:
1) If not: tagged updatec to the data list, send to base
2) If yes, tagged updatev to the data list, send to base
The result of each row is appended together and sent to the base
%Outline of ProcessXXXCmd ( XXX denotes the command found )
- ProcessInsertCmd
From the local parser, the base obtains a list of insert queries, tagged with insertc or insertv command, therefore, for each insert query:
1) If tagged with insertc, invoke ProcessInsertConstantCmd
2) If tagged with insertv, invoke ProcessInsertVariableCmd
ProcessInsertConstantCmd
This module handles with inserts that do not require subquery, given the entry lists
1) Extract all constants
2) Extract all attributes
3) Extract name of the relation to do insertion
4) Perform string manipulation to put the information together to form a valid SQL insertion query
ProcessInsertVariableCmd
Sometimes insert commands require subqueries to allow users to combine projection and selection query and insert commands into one step. Local parser will conclude that insertion requires a subquery as soon as it finds a variable in the row where a insert command, I., is found. The module is basically a combination of ProcessProjCmd and ProcessInsertConstantCmd, except that for simplicity, no union nor negation is supported in the subquery part ( it should be a very easy modification of this module to incorporate the union and negation, and it will be done if time permits)
1) Extract update attributes
2) Variable substitution
3) Subquery generation following the case of projection
4) Repeat the 4 steps in the case that only constants are used
- ProcessDeleteCmd/ProcessUpdateCmd
Formation of delete and update queries, conceptually, are very similar.
1) Specified constrains are first extracted
2) Formation of a subquery in case variables are found
3) Eliminate the aliasing of relations,
4) Put the pieces together
Essentially there are only differences in command names used, i.e. deletec and deletev, instead of updatec and updatev, and the syntax of the SQL statement. Interested readers are advised to compare the section of delete modules with update modules for understanding the differences.
Figure2.gif
Figure3.gif
Figure4.gif
Figure5.gif