header
vol. 13 no. 2, June 2008

 

Watch this: LINQ shifts the paradigm of query

LINQ = .Net language-integrated query

Terrence A. Brooks

Information School, University of Washington, Seattle, WA 98195, USA



"Query" "search" - What's the difference?

Search is accomplished by submitting a query to an information store. Query structures your search in accordance with the structure of an information store. To illustrate, suppose we search for dog in an relational database (using SQL: structured query language) and an XML document (using XPath):

SQL: SELECT * FROM book WHERE title = "dog"

XML: /book[@title = "dog"]/*

These two different query structures perform identical searches. They illustrate how the architecture of a particular storage medium intrudes into application programming demanding an awareness that now I'm querying a database or now I'm querying an XML document.

This promotes the paradigm that there are at least two worlds: a database world and an XML world.


Paradigm shift: Use the same LINQ query with all your storage media.

Using LINQ means that you no longer live in a database world or an XML world.

Historically, storage architecture has dominated query structure

Database as a network structure

Charles Bachman's "The programmer as navigator". (Communications of the ACM, 6(11), 653-8) described the database programmer as navigating among database records by targeting information keys and secondary keys. Here query expressed the link structure among database records.

Database as table structure

Ted Codd's "A relational model of data for large shared data banks" (Communications of the ACM, 13(6), 377�387) introduced the table metaphor as data structure. Relational database required the development of SQL: structured query language. Here query expressed a relationship among rows and columns of a table.

Information as a tree structure

Jon Bosak's "The birth of XML: a personal recollection" describes the application of the tree metaphor as a data structure. This led to the development of XPath, a language for selecting nodes by branching from root node to leaf node.

LINQ abstracts data structure from query

Anders_Hjelsberg Anders Hejlsberg, chief architect of the C# programming language, introduces LINQ (.Net language-integrated query) at the 2005 Professional Developer's Conference. He describes uniform query across domains such as database, XML and objects, such as arrays - any information store that permits one-by-one access to its contents. A wedge is driven between query and the particular structures of storage media. Inside an integrated development environment such as VS.Net, query becomes a first-class object with intellisense auto-completion and compile-time checking.

If I use a LINQ query, I really don't care how you've structured your information.

A LINQ example with different kinds of information store

An SQL database

Suppose you have an SQL database detailing the Three Stooges and their haircuts:

figure 2

An XML document

Suppose you have an XML document detailing the Stooges and their birth and death dates

<?xml version="1.0" encoding="UTF-8"?>
<threeStooges>
	<Stooge>
		<stoogeName>Moe</stoogeName>
		<birthDate>June 19, 1897</birthDate>
		<deathDate>May 4, 1975</deathDate>
	</Stooge>
...
</threeStooges>
An array of objects

Suppose you define an object such as stoogeFacts.

class stoogeFacts
    {
        public string stoogeName { get; set; }
        public string familyName { get; set; }
    }

And create an arrray of these objects detailing the names of Stooges:

stoogeFacts[] familyFacts = new stoogeFacts[3];
familyFacts[0] = new stoogeFacts();
    familyFacts[0].stoogeName = "Moe";
    familyFacts[0].familyName = "Howard";
...
Here is one LINQ query to three different sources:
// Target the XML document
XDocument xmlSource = XDocument.Load("stooges.xml");

// Target the database
stoogeClassesDataContext stoogeContext = new stoogeClassesDataContext();

// The LINQ structure
var stoogeGuys = 
     Beginning with the XML source
     from xmlGuys in xmlSource.Descendants("Stooge")
     Join to the array on the common element "stoogeName"
     join arrayGuys in familyFacts 
           on xmlGuys.Element("stoogeName").Value equals arrayGuys.stoogeName
     Join to the database on the common element "stoogeName"
     join dbGuys in stoogeContext.stoogeTables 
           on xmlGuys.Element("stoogeName").Value equals dbGuys.stoogeName 
     select new
     {
        firstName    = dbGuys.stoogeName,
        familyName   = arrayGuys.familyName,
        birthDate    = xmlGuys.Element("birthDate").Value,
        deathDate    = xmlGuys.Element("deathDate").Value,
        hairCutStyle = dbGuys.stoogeHaircut,
     };
Figure3

What does this paradigm shift signal?

Complementary developments of database architecture

New hybrid or multi-strucured database management systems have addressed the two-world problem of storage (database? or XML?) by offering both paradigms simultaneously. An example is DB2 Universal Database

To efficiently manage traditional SQL data types and XML data, DB2 includes two distinct storage mechanisms. However, it's important to note that the underlying storage mechanism used for a given data type is transparent to the application. In other words, the application doesn't need to explicitly specify which storage mechanism to use or manage physical aspects of storage, such as splitting portions of XML documents across multiple database pages. It simply enjoys the runtime performance benefits of storing and querying data in a format that's efficient for the target data. "What's new in DB2 Viper" by Cynthia M. Saracco, 09 Feb 2006


The post-database world

Our database heritage comes from deep in the 20th Century. We are used to seeing the architecture of storage media expressed in queries of double-quoted code that are treated differently from surrounding code. LINQ represents a paradigm shift in query and corresponding new flexibility in storage structure abstract those messy details from the logic of our programs.

Date: April 25, 2008

For further reading

Querying XML documents with LINQ to XML

Language-Integrated Query (LINQ) is a new approach that unifies the way data can be retrieved in .NET. There already has been a lot of talking about the power of LINQ and if you had the chance to test its capabilities you can surely confirm that. To resume in a few words what particular feature nominates LINQ as an efficient and elegant approach for data accessing, I want to mention that almost any data structure (arrays, relational data, collections and XML) can become a data source using LINQ.

LINQ Query Expressions (C# Programming Guide)

For a developer who writes queries, the most visible "language-integrated" part of LINQ is the query expression. Query expressions are written in a declarative query syntax introduced in C# 3.0. By using query syntax, you can perform even complex filtering, ordering, and grouping operations on data sources with a minimum of code. You use the same basic query expression patterns to query and transform data in SQL databases, ADO.NET Datasets, XML documents and streams, and .NET collections.

Next generation data access with LINQ

"Firing Up the hybrid engine" by Anjul Bhambhri

True-native support for both XML and relational data. "Hybrid systems don�t mandate that all data be represented as relational data, nor do they require that all data be in XML; instead, they provide the choice of the right model for the right task."


How to cite this paper

Brooks, T.A. (2008). "Watch this: LINQ shifts the paradigm of query"   Information Research, 13(2) paper TB0806 [Available at http://InformationR.net/ir/13-2/TB0806.html]
Find other papers on this subject




© the author, 2008.
Last updated: 18 May, 2008
Valid XHTML 1.0!