Index: openacs-4/packages/acs-core-docs/www/db-api.html =================================================================== RCS file: /usr/local/cvsroot/openacs-4/packages/acs-core-docs/www/db-api.html,v diff -u -r1.31.2.2 -r1.31.2.3 --- openacs-4/packages/acs-core-docs/www/db-api.html 29 Apr 2004 14:50:50 -0000 1.31.2.2 +++ openacs-4/packages/acs-core-docs/www/db-api.html 5 Jul 2004 19:47:30 -0000 1.31.2.3 @@ -3,166 +3,127 @@

Overview

One of OpenACS's great strengths is that code written for it is very close to the database. It is very easy to interact with the - database from anywhere within OpenACS. Our goal is to develop a + database from anywhere within OpenACS, and we have a coherent API for database access which makes this even easier.

More detailed information about the DB api is available at Database Access API. -

The New Way

- Introduced in ACS 3.4, the new Database API is meant to save - developers from the above tedium and provide a more structured syntax - for specifying database operations, including transactions. - Here's an example of the API. +

DB API Examples

+ The OpenACS database API is meant to save developers from making + common mistakes and to provide a more structured syntax for + specifying database operations, including transactions. Here's + an example of the API.

-set count 0 
-
+set count 0
 set tcl_var "foo"
-
 set sql {
-       	select 
-            foo, 
-            bar, 
-            baz 
-        from some_table, some_other_table
-        where some_table.id = some_other_table.id  
-        and some_table.condition_p = :tcl_var
-      	}
-      
-db_transaction {
-        db_foreach my_example_query_name $sql {
-            	...
-            	call_some_proc $foo $bar $baz
-            	incr count
-        }
+      	SELECT foo, bar, baz
+       FROM some_table, some_other_table
+       WHERE some_table.id = some_other_table.id
+         and some_table.condition_p = :tcl_var
 }
-    

+ +db_transaction { + db_foreach my_example_query_name $sql { + lappend rows [list $foo $bar $baz] + incr count + } + foreach row $rows { + call_some_proc $foo $bar $baz + } +}

There are several things to note here:

  1. No explicit code for grabbing and releasing handles. Usage of the Database API implicitly deals with all handle management issues.

  2. - The new command db_transaction + The db_transaction command makes the scope of a transaction - clear. db_transaction takes the + clear; db_transaction takes the code block argument and automatically runs it in the context of a - transaction. + transaction. If you use something like db_foreach though, you need to + make sure that there are no calls in the code block which would take + a second db handle since the transaction is only valid for + one handle (thats why we build up a list of returned values and call + a second proc outside the db_foreach loop).

  3. - The new command db_foreach writes + The command db_foreach writes our old while loop for us.

  4. - Every SQL query has a name, meant to be unique within the server - instance (though this is not enforced). + Every SQL query has a name, which is used in conjunction with .XQL files + to support multiple databases.

  5. - Finally and most importantly, there is a new scheme for passing data - from a Tcl variable to the database, which we'll cover next. + Finally and most importantly, there API implements bind variables, which we will cover next.

Bind Variables

- Bind variables are placeholders for literal values in an SQL query - being sent to the server. Take the example query above: in the old - way, data was generally passed to Oracle directly, via Tcl string - interpolation. So in the example above, the actual query we send would - look like this: + Bind variables are placeholders for literal values in an SQL + query being sent to the server. In the old way, data was + generally passed to directly to the DB backend, via Tcl string + interpolation. In the example above, the query would look like:

-select 
-    foo, 
-    bar, 
-    baz 
+select foo, bar, baz 
 from some_table, some_other_table
 where some_table.id=some_other_table.id  
-and some_table.condition_p = 'foo'
-    

+and some_table.condition_p = '$foo'

There are a few problems with this:

  1. - If the literal value is a - huge string, then we waste a lot of time in the database server doing - useless parsing. + If the value of $foo is a huge string, then we waste a lot + of time in the database server doing useless parsing.

  2. - Second, if the literal value contains characters like - single quotes, we have to be careful to double-quote them, because not - quoting them will lead to surprising errors. + Second, if the literal value contains characters like single + quotes, we have to be careful to properly escape them, + because not quoting them will lead to surprising errors.

  3. - Third, no type checking - occurs on the literal value. Finally, if the Tcl variable is passed in - or between web forms or otherwise subject to external modification, - there is nothing keeping malicious users from setting the Tcl variable - to some string that changes the query textually. -

    + Third, no type checking occurs on the literal + value. Finally, if the Tcl variable is passed in or between + web forms or otherwise subject to external modification, + there is nothing keeping malicious users from setting the + Tcl variable to some string that changes the query + textually. This type of attack, called SQL smuggling, can be very damaging - entire tables can be - exposed or have their contents deleted, for example. Another very - important reason for using bind variables is performance. Oracle caches - all previously parsed queries. If there are values in the where clause, - that is how the query is cached. It also performs bind variable - susbstitution after parsing the SQL statement. This means that SQL - statements that use bind variables will always match (assuming all else is - the same) while SQL statements that do not use bind variables will not - match unless the values in the statement are exactly the same. This will - improve performance considerably. + exposed or have their contents deleted, for example. +

    + Another very important reason for using bind variables is + performance. Oracle can cache previously parsed queries. If + there are values in the where clause, that is how the query + is cached. It also performs bind variable susbstitution + after parsing the SQL statement. This means that SQL + statements that use bind variables will always match + (assuming all else is the same) while SQL statements that do + not use bind variables will not match unless the values in + the statement are exactly the same. This will improve the + query cache considerably, which can make the server much + more efficient.

- To fix all these problems, we replace literal values in the query with - a placeholder character, and then send the data along after. So the - query looks like this: -

-select 
-    foo, 
-    bar, 
-    baz 
-from some_table, some_other_table
-where some_table.id = some_other_table.id
-and some_table.condition_p = ?
-    

- The '?' character means "This will be filled in later with literal - data". In use, you might write code that looks like this: -

-set statement [prepare_query "
-    		select 
-        		foo, 
-        		bar, 
-        		baz 
-    		from some_table, some_other_table
-    		where some_table.id = some_other_table.id  
-    		and some_table.condition_p = ?
-	       "]
-
-[bind_param $statement 1 $tcl_var]
-    

- The above example is meant to be psuedo-Tcl - no API like this - actually exists. What happens is that we first send the SQL statement - to the server for parsing, then later we bind values to the - placeholders, and send those values along seperately. This seperate - binding step is where the term bind variable comes from. + What the DB API (in conjuntion with the database drivers + implemented for aolserver) do is send the SQL statement to the + server for parsing, then bind values to the + variables and sends those values along seperately as a second + step. This seperate binding step is where the term + bind variable comes from.

- This split has several advantages. First, type checking happens on the - literal. If the column we are comparing against holds numbers, and we - send a string, we get a nice error. Second, since string literals are - no longer in the query, no extra quoting is required. Third, - substitution of bind variables cannot change the actual text of the - query, only the literal values in the placeholders. + This split has several advantages. First, type checking happens + on the literal. If the column we are comparing against holds + numbers, and we send a string, we get a nice error. Second, + since string literals are no longer in the query, no extra + quoting is required. Third, substitution of bind variables + cannot change the actual text of the query, only the literal + values in the placeholders. The database API makes bind + variables easy to use by hooking them smoothly into the Tcl + runtime so you simply provide :tclvar and the value of $tclvar + is sent to the backend to actually execute the query.

- The database API makes bind variables easy to use by hooking them - smoothly into the Tcl runtime. Rather than using a '?' as a generic - placeholder, you use a colon followed by the name of the Tcl variable - that you wish to pass as a literal. So here's the final, real-life - form of the example query: -

-select 
-    foo, 
-    bar, 
-    baz 
-from some_table, some_other_table
-where some_table.id = some_other_table.id  
-and some_table.condition_p = :tcl_var
-    

The database API parses the query and pulls out all the bind variable specifications and replaces them with generic placeholders. It then automatically pulls the values of the named Tcl vars out of the runtime environment of the script, and passes them to the database.

Note that while this looks like a simple syntactic change, it really - is very different from how we've written queries in the past. You use + is very different from how interpolated text queries work. You use bind variables to replace what would otherwise be a literal value in a query, and Tcl style string interpolation does not happen. So you cannot do something like: @@ -175,7 +136,7 @@ SQL will not allow a literal to occur where we've put the bind variables, so the query is syntactically incorrect. You have to remember that while the bind variable syntax looks similar to variable - interpolation in Tcl, it is not the same thing at all. + interpolation in Tcl, It is not the same thing at all.

Finally, the DB API has several different styles for passing bind variable values to queries. In general, use the style presented here @@ -286,16 +247,12 @@

Therefore, the Database Access API provides a database-independent way to represent null (instead of the Oracle-specific idiom of the empty string): db_null.

Use it instead of the empty string whenever you want to set a column value - explicitly to null, e.g.:

-
-set bar [db_null]
+	explicitly to null, e.g.:

set bar [db_null]
 set baz [db_null]
 
 db_dml foo_create "insert into foo(bar, baz) values(:bar, :baz)"
 #
-# sets the values for both the "bar" and "baz" columns to null
-
-      

Sequence Pooling

+# sets the values for both the "bar" and "baz" columns to null

Sequence Pooling

The database library can transparently maintain pools of sequence values, so that each request for a new sequence value (using db_nextval) does not incur a roundtrip to the server. For instance, this functionality is @@ -310,7 +267,7 @@

in any configuration section in the yourservername.ini - file, e.g., e.g., + file, e.g.,

 
 [ns/server/yourservername/acs/security]
@@ -391,7 +348,6 @@
     and not include it in the multirow. Or you can call break
     to skip this row and quit looping.
     

- Notice the nonstandard numbering (everything else in Tcl starts at 0); the reason is that the graphics designer, a non programmer, may wish to work with row numbers. @@ -682,4 +638,4 @@

($Id$)

-

View comments on this page at openacs.org
+

View comments on this page at openacs.org