Automatic conflict resolution to integrate relational schema
With the constantly increasing reliance on database systems to store, process, and display data comes the additional problem of ensuring interoperability between these systems. On a wider scale, the World-Wide Web (WWW) provides users with the ability to access a vast number of data sources distributed across the planet. However, a fundamental problem with distributed data access is the determination of semantically equivalent data. Ideally, users should be able to extract data from multiple sites and have it automatically combined and presented to them in a usable form. No system has been able to accomplish these goals due to limitations in expressing and capturing data semantics. Schema integration is required to provide database interoperability and involves the resolution of naming, structural, and semantic conflicts. To this point, automatic schema integration has not been possible. This thesis demonstrates that integration may be increasingly automated by capturing data semantics using a standard dictionary. This thesis proposes an architecture for automatically constructing an integrated view by combining local views that are defined by independently expressing database semantics in XML documents (X-Specs) using only a pre-defined dictionary as a binding between integration sites. The dictionary eliminates naming conflicts and reduces semantic conflicts. Structural conflicts are resolved at query-time by translating from the semantic integrated view to structural queries. The system provides both logical and physical access transparency by mapping user queries on high-level concepts to schema elements in the underlying data sources. The architecture automatically integrates relational databases, and its application of standardization to the integration problem is unique. The architecture may be deployed in a centralized or distributed fashion, and preserves full database autonomy while allowing transparent access to all databases participating in a global federation without the user's knowledge of the underlying data sources, their location, and their structures. Thus, the contribution is a system which provides system transparency to users, while preserving autonomy for all systems. A distributed deployment allows integration using a web browser, and would have a major impact on how the Web is used and delivered. The integration software, Unity, is the bridge between concept and implementation. Unity is a complete software package for the construction and modification of standard dictionaries, parsing of database schema and metadata to construct X-Specs, combining X-Specs into an integrated view, and for transparent querying. Integration results obtained using Unity illustrate the usefulness of the approach.