By Pete Su and Jon Salz. Modified by Roberto Mello.
One of OpenACS's great strengths is that code written for it is very close to the database. It is very easy to interact with the database from anywhere within OpenACS, and we have a coherent API for database access which makes this even easier.
More detailed information about the DB api is available at Database Access API.
The OpenACS database API is meant to save developers from making common mistakes and to provide a more structured syntax for specifying database operations, including transactions. Here's an example of the API.
set count 0 set tcl_var "foo" set sql { SELECT foo, bar, baz FROM some_table, some_other_table WHERE some_table.id = some_other_table.id and some_table.condition_p = :tcl_var } db_transaction { db_foreach my_example_query_name $sql { lappend rows [list $foo $bar $baz] incr count } foreach row $rows { call_some_proc $foo $bar $baz } }
There are several things to note here:
No explicit code for grabbing and releasing handles. Usage of the Database API implicitly deals with all handle management issues.
The db_transaction
command
makes the scope of a transaction
clear; db_transaction
takes the
code block argument and automatically runs it in the context of a
transaction. If you use something like db_foreach though, you need to
make sure that there are no calls in the code block which would take
a second db handle since the transaction is only valid for
one handle (thats why we build up a list of returned values and call
a second proc outside the db_foreach loop).
The command db_foreach
writes
our old while loop for us.
Every SQL query has a name, which is used in conjunction with .XQL files to support multiple databases.
Finally and most importantly, there API implements bind variables, which we will cover next.
Bind variables are placeholders for literal values in an SQL query being sent to the server. In the old way, data was generally passed to directly to the DB backend, via Tcl string interpolation. In the example above, the query would look like:
select foo, bar, baz from some_table, some_other_table where some_table.id=some_other_table.id and some_table.condition_p = '$foo'
There are a few problems with this:
If the value of $foo is a huge string, then we waste a lot of time in the database server doing useless parsing.
Second, if the literal value contains characters like single quotes, we have to be careful to properly escape them, because not quoting them will lead to surprising errors.
Third, no type checking occurs on the literal value. Finally, if the Tcl variable is passed in or between web forms or otherwise subject to external modification, there is nothing keeping malicious users from setting the Tcl variable to some string that changes the query textually. This type of attack, called SQL smuggling, can be very damaging - entire tables can be exposed or have their contents deleted, for example.
Another very important reason for using bind variables is performance. Oracle can cache previously parsed queries. If there are values in the where clause, that is how the query is cached. It also performs bind variable susbstitution after parsing the SQL statement. This means that SQL statements that use bind variables will always match (assuming all else is the same) while SQL statements that do not use bind variables will not match unless the values in the statement are exactly the same. This will improve the query cache considerably, which can make the server much more efficient.
What the DB API (in conjuntion with the database drivers implemented for aolserver) do is send the SQL statement to the server for parsing, then bind values to the variables and sends those values along seperately as a second step. This seperate binding step is where the term bind variable comes from.
This split has several advantages. First, type checking happens on the literal. If the column we are comparing against holds numbers, and we send a string, we get a nice error. Second, since string literals are no longer in the query, no extra quoting is required. Third, substitution of bind variables cannot change the actual text of the query, only the literal values in the placeholders. The database API makes bind variables easy to use by hooking them smoothly into the Tcl runtime so you simply provide :tclvar and the value of $tclvar is sent to the backend to actually execute the query.
The database API parses the query and pulls out all the bind variable specifications and replaces them with generic placeholders. It then automatically pulls the values of the named Tcl vars out of the runtime environment of the script, and passes them to the database.
Note that while this looks like a simple syntactic change, it really is very different from how interpolated text queries work. You use bind variables to replace what would otherwise be a literal value in a query, and Tcl style string interpolation does not happen. So you cannot do something like:
set table "baz" set condition "where foo = bar" db_foreach my_query { select :table from some_table where :condition }
SQL will not allow a literal to occur where we've put the bind variables, so the query is syntactically incorrect. You have to remember that while the bind variable syntax looks similar to variable interpolation in Tcl, It is not the same thing at all.
Finally, the DB API has several different styles for passing bind variable values to queries. In general, use the style presented here because it is the most convenient.
Every db_*
command accepting a SQL command as an argument
supports bind variables. You can either
Specify the -bind
switch to provide a set with bind variable
values, or
Specify the -bind
switch to explicitly provide a list of
bind variable names and values, or
Not specify a bind variable list at all, in which case Tcl variables are used as bind variables.
The default behavior (i.e., if the -bind
switch is omitted) is
that these procedures expect to find local variables that correspond in name
to the referenced bind variables, e.g.:
set user_id 123456 set role "administrator" db_foreach user_group_memberships_by_role { select g.group_id, g.group_name from user_groups g, user_group_map map where g.group_id = map.user_id and map.user_id = :user_id and map.role = :role } { # do something for each group of which user 123456 is in the role # of "administrator" }
The value of the local Tcl variable user_id
(123456) is bound to
the user_id
bind variable.
The -bind
switch can takes the name of an ns_set
containing keys for each bind variable named in the query, e.g.:
set bind_vars [ns_set create] ns_set put $bind_vars user_id 123456 ns_set put $bind_vars role "administrator" db_foreach user_group_memberships_by_role { select g.group_id, g.group_name from user_groups g, user_group_map map where g.group_id = map.user_id and map.user_id = :user_id and map.role = :role } -bind $bind_vars { # do something for each group in which user 123456 has the role # of "administrator" }
Alternatively, as an argument to -bind
you can specify a list of
alternating name/value pairs for bind variables:
db_foreach user_group_memberships_by_role { select g.group_id, g.group_name from user_groups g, user_group_map map where g.group_id = map.user_id and map.user_id = :user_id and map.role = :role } -bind [list user_id 123456 role "administrator"] { # do something for each group in which user 123456 has the role # of "administrator" }
When processing a DML statement, Oracle coerces empty strings into
null
. (This coercion does not occur in the
WHERE
clause of a query, i.e.
col = ''
and
col is null
are not equivalent.)
As a result, when using bind variables, the only way to make Oracle set a
column value to null
is to set the corresponding bind variable
to the empty string, since a bind variable whose value is the string
"null" will be interpreted as the literal string
"null".
These Oracle quirks complicate the process of writing clear and abstract DML difficult. Here is an example that illustrates why:
# # Given the table: # # create table foo ( # bar integer, # baz varchar(10) # ); # set bar "" set baz "" db_dml foo_create "insert into foo(bar, baz) values(:bar, :baz)" # # the values of the "bar" and "baz" columns in the new row are both # null, because Oracle has coerced the empty string (even for the # numeric column "bar") into null in both cases
Since databases other than Oracle do not coerce empty strings into
null
, this code has different semantics depending on the
underlying database (i.e., the row that gets inserted may not have null as
its column values), which defeats the purpose of SQL abstraction.
Therefore, the Database Access API provides a database-independent way to
represent null
(instead of the Oracle-specific idiom of the
empty string): db_null
.
Use it instead of the empty string whenever you want to set a column value
explicitly to null
, e.g.:
set bar [db_null] set baz [db_null] db_dml foo_create "insert into foo(bar, baz) values(:bar, :baz)" # # sets the values for both the "bar" and "baz" columns to null
The database library can transparently maintain pools of sequence values, so
that each request for a new sequence value (using db_nextval
)
does not incur a roundtrip to the server. For instance, this functionality is
very useful in the security/sessions library, which very frequently allocates
values from the sec_id_seq
sequence. To utilize this
functionality for a particular sequence, register the sequence to be pooled,
either using the db_register_pooled_sequence
procedure at server
startup time, or by including a configuration parameter of the form
PoolSequence.sequence_name_seq=count
in any configuration section in the yourservername.ini
file, e.g.,
[ns/server/yourservername/acs/security]
PoolSequence.sec_id_seq=20
The database library will allocate this number of sequence values at server
startup. It will periodically scan pools and allocate new values for
sequences which are less than half-full. (This normally occurs every 60
seconds, and is configurable via the
PooledSequenceUpdateInterval
parameter in the
[ns/server/
yourservername
/acs/database]
configuration
section.)
The Database API has several functions that wrap familiar parts of the AOLserver database API.
Note that you never have to use ns_db
anymore (including
ns_db gethandle
)! Just start doing stuff, and (if you want) call
db_release_unused_handles
when you're done as a hint to
release the database handle.
db_abort_transaction
db_abort_transaction
Aborts all levels of a transaction. That is if this is called within
several nested transactions, all of them are terminated. Use this instead of
db_dml "abort" "abort transaction"
.
db_multirow
db_multirow [ -local ] [ -append ] [ -extend column_list ] \ var-name statement-name sql \ [ -bind bind_set_id | -bind bind_value_list ] \ code_block [ if_no_rows if_no_rows_block ]
Performs the SQL query sql
, saving results in variables
of the form
var_name:1
, var_name:2
, etc,
setting var_name:rowcount
to the total number
of rows, and setting var_name:columns
to a
list of column names.
Each row also has a column, rownum, automatically added and set to the row number, starting with 1. Note that this will override any column in the SQL statement named 'rownum', also if you're using the Oracle rownum pseudo-column.
If the -local
is passed, the variables defined
by db_multirow will be set locally (useful if you're compiling dynamic templates
in a function or similar situations).
You may supply a code block, which will be executed for each row in the loop. This is very useful if you need to make computations that are better done in Tcl than in SQL, for example using ns_urlencode or ad_quotehtml, etc. When the Tcl code is executed, all the columns from the SQL query will be set as local variables in that code. Any changes made to these local variables will be copied back into the multirow.
You may also add additional, computed columns to the multirow, using the
-extend { col_1 col_2 ... }
switch. This is
useful for things like constructing a URL for the object retrieved by
the query.
If you're constructing your multirow through multiple queries with the
same set of columns, but with different rows, you can use the
-append
switch. This causes the rows returned by this query
to be appended to the rows already in the multirow, instead of starting
a clean multirow, as is the normal behavior. The columns must match the
columns in the original multirow, or an error will be thrown.
Your code block may call continue
in order to skip a row
and not include it in the multirow. Or you can call break
to skip this row and quit looping.
Notice the nonstandard numbering (everything else in Tcl starts at 0); the reason is that the graphics designer, a non programmer, may wish to work with row numbers.
Example:
db_multirow -extend { user_url } users users_query { select user_id first_names, last_name, email from cc_users } { set user_url [acs_community_member_url -user_id $user_id] }
You can also iterate over a multirow after it has been created - check the documentation for template::multirow
For example,
db_multirow assets assets { select asset_id, from ... } .. set asset_id_l [list] multirow foreach assets { lappend asset_id_l $asset_id }
Technically it's equivalent to using a code block on the end of your db_multirow.
db_null
db_null
Returns a value which can be used in a bind variable
to represent the SQL value
null
.
See Nulls and
Bind Variables above.
db_foreach
db_foreach statement-name sql [ -bind bind_set_id | -bind bind_value_list ] \ [ -column_array array_name | -column_set set_name ] \ code_block [ if_no_rows if_no_rows_block ]
Performs the SQL query
sql
, executing
code_block
once for each row
with variables set to column values (or a set or array
populated if
-column_array
or
column_set
is
specified). If the query returns no rows, executes
if_no_rows_block
(if provided).
Example:
db_foreach select_foo "select foo, bar from greeble" { doc_body_append "<li>foo=$foo; bar=$bar\n" } if_no_rows { doc_body_append "<li>There are no greebles in the database.\n" }
The code block may contain break
statements (which terminate the
loop and flush the database handle) and continue
statements
(which continue to the next row of the loop).
db_1row
db_1row statement-name sql [ -bind bind_set_id | -bind bind_value_list ] \ [ -column_array array_name | -column_set set_name ]
Performs the SQL query
sql
,
setting variables to column values. Raises an error if the
query does not return exactly 1 row.
Example:
db_1row select_foo "select foo, bar from greeble where greeble_id = $greeble_id" # Bombs if there's no such greeble! # Now $foo and $bar are set.
db_0or1row
db_0or1row statement-name sql [ -bind bind_set_id | -bind bind_value_list ] \ [ -column_array array_name | -column_set set_name ]
Performs the SQL query
sql
.
If a row is returned, sets variables to column values and
returns 1. If no rows are returned, returns 0. If more
than one row is returned, throws an error.
db_nextval
db_nextval sequence-name
Returns the next value for the sequence sequence-name (using a
SQL statement like SELECT
sequence-name
.nextval FROM
DUAL
). If sequence pooling is enabled for the sequence, transparently
uses a value from the pool if available to save a round-trip to the database
(see Sequence Pooling).
db_register_pooled_sequence
db_register_pooled_sequence sequence-name pool-size
Registers the sequence sequence-name to be pooled, with a pool size of pool-size sequence values (see Sequence Pooling).
db_string
db_string statement-name sql [ -default default ] [ -bind bind_set_id | -bind bind_value_list ]
Returns the first column of the result of SQL query
sql
.
If sql
doesn't return a
row, returns
default
(or throws an error if
default
is unspecified). Analogous to
database_to_tcl_string
and
database_to_tcl_string_or_null
.
db_list
db_list statement-name sql [ -bind bind_set_id | -bind bind_value_list ]
Returns a Tcl list of the values in the first column of the result of SQL
query
sql
.
If sql
doesn't
return any rows, returns an empty list. Analogous to
database_to_tcl_list
.
db_list_of_lists
db_list_of_lists statement-name sql [ -bind bind_set_id | -bind bind_value_list ]
Returns a Tcl list, each element of which is a list of all column values
in a row of the result of SQL query sql
. If
sql
doesn't return any rows, returns an empty list.
(Analogous to database_to_tcl_list_list
.)
db_dml
db_dml statement-name sql \ [ -bind bind_set_id | -bind bind_value_list ] \ [ -blobs blob_list | -clobs clob_list | -blob_files blob_file_list | -clob_files clob_file_list ]
Performs the DML or DDL statement sql
.
If a length-n list of blobs or clobs is provided, then the SQL
should return n blobs or clobs into the bind variables
:1
, :2
, ... :n
.
blobs
or clobs
, if specified,
should be a list of individual BLOBs or CLOBs to insert;
blob_files
or clob_files
, if
specified, should be a list of paths to files containing the data to
insert. Only one of -blobs
, -clobs
,
-blob_files
, and -clob_files
may be provided.
Example:
db_dml insert_photos { insert photos(photo_id, image, thumbnail_image) values(photo_id_seq.nextval, empty_blob(), empty_blob()) returning image, thumbnail_image into :1, :2 } -blob_files [list "/var/tmp/the_photo" "/var/tmp/the_thumbnail"]
This inserts a new row into the photos
table, with the contents
of the files /var/tmp/the_photo
and
/var/tmp/the_thumbnail
in the image
and
thumbnail
columns, respectively.
db_write_clob
,
db_write_blob
,
db_blob_get_file
db_write_clob statement-name sql [ -bind bind_set_id | -bind bind_value_list ] db_write_blob statement-name sql [ -bind bind_set_id | -bind bind_value_list ] db_blob_get_file statement-name sql [ -bind bind_set_id | -bind bind_value_list ]
Analagous to ns_ora write_clob/write_blob/blob_get_file
.
db_release_unused_handles
db_release_unused_handles
Releases any allocated, unused database handles.
db_transaction
db_transaction code_block [ on_error { code_block } ]
Executes code_block
transactionally. Nested
transactions are supported (end transaction
is transparently
ns_db dml
'ed when the outermost transaction completes). The
db_abort_transaction
command can be used to abort all levels of
transactions. It is possible to specify an optional on_error
code block that will be executed if some code in code_block throws
an exception. The variable errmsg
will be bound in that scope.
If there is no on_error
code, any errors will be propagated.
Example:
proc replace_the_foo { col } { db_transaction { db_dml delete {delete from foo} db_dml insert {insert into foo(col) values(:col)} } } proc print_the_foo {} { doc_body_append "foo is [db_string "select col from foo"]<br>\n" } replace_the_foo 8 print_the_foo ; # Writes out "foo is 8" db_transaction { replace_the_foo 14 print_the_foo ; # Writes out "foo is 14" db_dml insert_foo {insert into some_other_table(col) values(999)} ... db_abort_transaction } on_error { doc_body_append "Error in transaction: $errmsg" } print_the_foo ; # Writes out "foo is 8"
db_resultrows
db_resultrows
Returns the number of rows affected or returned by the previous statement.
db_with_handle
db_with_handle var code_block
Places a database handle into the variable var
and
executes code_block
. This is useful when you don't
want to have to use the new API (db_foreach
,
db_1row
, etc.), but need to use database handles explicitly.
Example:
proc lookup_the_foo { foo } { db_with_handle db { return [db_string unused "select ..."] } } db_with_handle db { # Now there's a database handle in $db. set selection [ns_db select $db "select foo from bar"] while { [ns_db getrow $db $selection] } { set_variables_after_query lookup_the_foo $foo } }
db_nullify_empty_string
db_nullify_empty_string string
For true SQL purists, we provide the convenience function
db_nullify_empty_string
, which returns
[db_null] if its string
argument is the empty string
and can be used to encapsulate another Oracle quirk:
set baz "" # Clean out the foo table # db_dml unused {delete from foo} db_dml unused {insert into foo(baz) values(:baz)} set n_rows [db_string unused "select count(*) from foo where baz is null"] # # $n_rows is 1; in effect, the "baz is null" criterion is matching # the empty string we just inserted (because of Oracle's coercion # quirk)
To balance out this asymmetry, you can explicitly set baz
to
null
by writing:
db_dml foo_insert {insert into foo(baz) values(:1)} {[db_nullify_empty_string $baz]}
The database API allows for direct caching of query results. Repeated calls will return the cached value until it is either explicitly flushed using db_flush_cache, times out (configured the ns_cache is called to create the cache), or another cached query fills the cache, causing older entries to be flushed.
Values returned by a query are cached if you pass the "-cache_key" switch to the database procedure. The switch value will be used as the key in the ns_cache eval call used to execute the query and processing code. The db_flush proc should be called to flush the cache when appropriate. The "-cache_pool" parameter can be used to specify the cache pool to be used, and defaults to db_cache_pool. The size of the default cache is governed by the kernel parameter "DBCacheSize" in the "caching" section.
Currently db_string, db_list, db_list_of_lists, db_1row, db_0or1row, and db_multirow support caching.
For caching to be effective, one must carefully design a cache_pool and cache_key strategy that uniquely identifies a query within the system, including the relevant objects being referenced by the query. Typically a cache_key should include one or more object_ids and a name that identifies the operation being done.
Here is an example from the layout-manager package:
# Query to return the elements of a page as a list. The prefix "page_" is used to denote # that this is a page-related query, page_id is used to uniquely identify the query # by object, and the suffix uniquely defines the operation being performed on the # page object. db_list -cache_key page_${page_id}_get_elements get_elements {} # When the contents of a page are changed, we flush all page-related queries for the given # page object using db_flush_cache. db_flush_cache -cache_key_pattern page_${page_id}_*