Database Access API
by Jon SalzTcl procedures: /packages/acs-kernel/10-database-procs.tclTcl initialization: /packages/acs-kernel/database-init.tclThe Big Picture
One of OpenACS's great strengths is that code written for it is very close to
the database. It is very easy to interact with the database from anywhere
within OpenACS. Our goal is to develop a coherent API for database access which
makes this even easier.
There were four significant problems with the way OpenACS previously used the
database (i.e., directly through the ns_db interface):Handle management. We required code to pass database
handles around, and for routines which needed to perform database access but
didn't receive a database handle as input, it was difficult to know from
which of the three "magic pools" (main, subquery, and log) to
allocate a new handle.
Nested transactions. In our Oracle driver, begin
transaction really means "turn auto-commit mode off" and
end transaction means "commit the current transaction and
turn auto-commit mode on." Thus if transactional code needed to call a
routine which needed to operate transactionally, the semantics were
non-obvious. Consider:
proc foo { db args } {
db_transaction {
...
}
}
db_transaction {
db_dml unused "insert into greeble(bork) values(33)"
foo $db
db_dml unused "insert into greeble(bork) values(50)"
}
This would insert greeble #33 and do all the stuff in foo
transactionally, but the end transaction in foo
would actually cause a commit, and greeble #50 would later be inserted in
auto-commit mode. This could cause subtle bugs: e.g., in the case that the
insert for greeble #50 failed, part of the "transaction" would have
already have been committed!. This is not a good thing.
Unorthodox use of variables. The standard mechanism for
mapping column values into variables involved the use of the
set_variables_after_query routine, which relies on an uplevel
variable named selection (likewise for
set_variables_after_subquery and subselection).
Hard-coded reliance on Oracle. It's difficult to
write code supporting various different databases (dynamically using the
appropriate dialect based on the type of database being used, e.g., using
DECODE on Oracle and CASE ... WHEN on
Postgres).
The Database Access API addresses the first three problems by:
making use of database handles transparentwrapping common database operations (including transaction management) in
Tcl control structures (this is, after all, what Tcl is good at!)
It lays the groundwork for addressing the fourth problem by assigning each
SQL statement a logical name. In a future version of the OpenACS Core, this API
will translate logical statement names into actual SQL, based on the type of
database in use. (To smooth the learning curve, we provide a facility for
writing SQL inline for a "default SQL dialect", which we assume to
be Oracle for now.)
To be clear, SQL abstraction is not fully implemented in OpenACS
3.3.1. The statement names supplied to each call are not used by the API at
all. The API's design for SQL abstraction is in fact incomplete;
unresolved issues include:how to add WHERE clause criteria dynamicallyhow to build a dynamic ORDER BY clause (Ben Adida has a
proposed solution for this)how to define a statement's formal interface (i.e., what bind
variables it expects, what columns its SELECT clause must
contain if it's a query) without actually implementing the statement in a
specific SQL dialect
So why is the incremental change of adding statement naming to the API worth
the effort? It is worth the effort because we know that giving each SQL
statement a logical name will be required by the complete SQL abstraction
design. Therefore, we know that the effort will not be wasted, and taking
advantage of the new support for bind variables will already require code
that uses 3.3.0 version of the API to be updated.
The Bell Tolls for set_variables_after_queryset_variables_after_query is gone! (Well, it's still there,
but you'll never need to use it.) The new API routines set local
variables automatically. For instance:
db_1row select_names "select first_names, last_name from users where user_id = [ad_get_user_id]"
doc_body_append "Hello, $first_names $last_name!"
Like ns_db 1row, this will bomb if the query doesn't return
any rows (no such user exists). If this isn't what you want, you can
write:
if { [db_0or1row select_names "select first_names, last_name from users where user_id = [ad_get_user_id]"] } {
doc_body_append "Hello, $first_names $last_name!"
} else {
# Executed if the query returns no rows.
doc_body_append "There's no such user!"
}
Selecting a bunch of rows is a lot prettier now:
db_foreach select_names "select first_names, last_name from users" {
doc_body_append "Say hi to $first_names $last_name for me!<br>"
}
That's right, db_foreach is now like ns_db
select plus a while loop plus
set_variables_after_query plus an if statement
(containing code to be executed if no rows are returned).
db_foreach select_names "select first_names, last_name from users where last_name like 'S%'" {
doc_body_append "Say hi to $first_names $last_name for me!<br>"
} if_no_rows {
doc_body_append "There aren't any users with last names beginnings with S!"
}
Handle Management
The new API keeps track of which handles are in use, and automatically
allocates new handles when they are necessary (e.g., to perform subqueries
while a select is active). For example:
doc_body_append "<ul>"
db_foreach select_names "select first_names, last_name, user_id from users" {
# Automatically allocated a database handle from the main pool.
doc_body_append "<li>User $first_names $last_name\n<ul>"
db_foreach select_groups "select group_id from user_group_map where user_id = $user_id" {
# There's a selection in progress, so we allocated a database handle
# from the subquery pool for this selection.
doc_body_append "<li>Member of group #$group_id.\n"
} if_no_rows {
# Not a member of any groups.
doc_body_append "<li>Not a member of any group.\n"
}
}
doc_body_append "</ul>"
db_release_unused_handles
A new handle isn't actually allocated and released for every selection,
of course - as a performance optimization, the API keeps old handles around
until db_release_unused_handles is invoked (or the script
terminates).
Note that there is no analogue to ns_db gethandle - the
handle is always automatically allocated the first time it's needed.Bind VariablesIntroduction
Most SQL statements require that the code invoking the statement pass along
data associated with that statement, usually obtained from the user. For
instance, in order to delete a WimpyPoint presentation, a Tcl script might
use the SQL statement
delete from wp_presentations where presentation_id = some_presentation_id
where some_presentation_id is a number which is a valid
presentation ID of the presentation I want to delete. It's easy to write
code handling situations like this since SQL statements can include
bind variables, which represent placeholders for actual
data. A bind variable is specified as a colon followed by an identifier, so
the statement above can be coded as:
db_dml presentation_delete {
delete from wp_presentations where presentation_id = :some_presentation_id
}
When this SQL statement is invoked, the value for the bind variable
:some_presentation_id is pulled from the Tcl variable
$some_presentation_id (in the caller's environment). Note
that bind variables are not limited to one per statement; you can use an
arbitrary number, and each will pull from the correspondingly named Tcl
variable. (Alternatively, you can also specify an list or ns_set
providing bind variables' values; see Usage.)
The value of a bind variable is taken literally by the database driver, so
there is never any need to put single-quotes around the value for a bind
variable, or to use db_quote to escape single-quotes contained
in the value. The following works fine, despite the apostrophe:
set exclamation "That's all, folks!"
db_dml exclamation_insert { insert into exclamations(exclamation) values(:exclamation) }
Note that you can use a bind variable in a SQL statement only where you
could use a literal (a number or single-quoted string). Bind variables cannot
be placeholders for things like SQL keywords, table names, or column names,
so the following will not work, even if $table_name is set
properly:
select * from :table_name
Why Bind Variables Are Useful
Why bother with bind variables at all - why not just write the Tcl statement
above like this:
db_dml presentation_delete "
delete from wp_presentations where presentation_id = $some_presentation_id
"
(Note the use of double-quotes to allow the variable reference to
$some_presentation_id to be interpolated in.) This will work,
but consider the case where some devious user causes
some_presentation_id to be set to something like '3 or
1 = 1', which would result in the following statement being
executed:
delete from wp_presentations where presentation_id = 3 or 1 = 1
This deletes every presentation in the database! Using bind variables
eliminates this gaping security hole: since bind variable values are taken
literally. Oracle will attempt to delete presentations whose presentation ID
is literally '3 or 1 = 1' (i.e., no presentations, since
'3 or 1 = 1' can't possibly be a valid integer
primary key for wp_presentations. In general, since Oracle
always considers the values of bind variables to be literals, it becomes more
difficult for users to perform URL surgery to trick scripts into running
dangerous queries and DML.
UsageEvery db_* command accepting a SQL command as an argument
supports bind variables. You can eitherspecify the -bind switch to provide a set with bind variable
values, orspecify the -bind switch to explicitly provide a list of
bind variable names and values, ornot specify a bind variable list at all, in which case Tcl variables are
used as bind variables.
The default behavior (i.e., if the -bind switch is omitted) is
that these procedures expect to find local variables that correspond in name
to the referenced bind variables, e.g.:
set user_id 123456
set role "administrator"
db_foreach user_group_memberships_by_role {
select g.group_id, g.group_name
from user_groups g, user_group_map map
where g.group_id = map.user_id
and map.user_id = :user_id
and map.role = :role
} {
# do something for each group of which user 123456 is in the role
# of "administrator"
}
The value of the local Tcl variable user_id (123456) is bound to
the user_id bind variable.
The -bind switch can takes the name of an ns_set
containing keys for each bind variable named in the query, e.g.:
set bind_vars [ns_set create]
ns_set put $bind_vars user_id 123456
ns_set put $bind_vars role "administrator"
db_foreach user_group_memberships_by_role {
select g.group_id, g.group_name
from user_groups g, user_group_map map
where g.group_id = map.user_id
and map.user_id = :user_id
and map.role = :role
} -bind $bind_vars {
# do something for each group in which user 123456 has the role
# of "administrator"
}
Alternatively, as an argument to -bind you can specify a list of
alternating name/value pairs for bind variables:
db_foreach user_group_memberships_by_role {
select g.group_id, g.group_name
from user_groups g, user_group_map map
where g.group_id = map.user_id
and map.user_id = :user_id
and map.role = :role
} -bind [list user_id 123456 role "administrator"] {
# do something for each group in which user 123456 has the role
# of "administrator"
}
Nulls and Bind Variables
When processing a DML statement, Oracle coerces empty strings into
null. (This coercion does not occur in the
WHERE clause of a query, i.e.
col = '' and
col is null are not equivalent.)
As a result, when using bind variables, the only way to make Oracle set a
column value to null is to set the corresponding bind variable
to the empty string, since a bind variable whose value is the string
"null" will be interpreted as the literal string
"null".These Oracle quirks complicate the process of writing clear and abstract
DML difficult. Here is an example that illustrates why:
#
# Given the table:
#
# create table foo (
# bar integer,
# baz varchar(10)
# );
#
set bar ""
set baz ""
db_dml foo_create "insert into foo(bar, baz) values(:bar, :baz)"
#
# the values of the "bar" and "baz" columns in the new row are both
# null, because Oracle has coerced the empty string (even for the
# numeric column "bar") into null in both cases
Since databases other than Oracle do not coerce empty strings into
null, this code has different semantics depending on the
underlying database (i.e., the row that gets inserted may not have null as
its column values), which defeats the purpose of SQL abstraction.
Therefore, the Database Access API provides a database-independent way to
represent null (instead of the Oracle-specific idiom of the
empty string): db_null.Use it instead of the empty string whenever you want to set a column value
explicitly to null, e.g.:
set bar [db_null]
set baz [db_null]
db_dml foo_create "insert into foo(bar, baz) values(:bar, :baz)"
#
# sets the values for both the "bar" and "baz" columns to null
SQL Abstraction
We now require that each SQL statement be assigned a logical name for the
statement that is unique to the procedure or page in which it is defined.
This is so that (eventually) we can implement logically named statements with
alternative SQL for non-Oracle databases (e.g., Postgres). More on this
later.
Placing Column Values in Arrays and Sets
Normally, db_foreach, db_0or1row, and
db_1row places the results of queries in Tcl variables, so you
can say:
db_foreach users_select "select first_names, last_name from users" {
doc_body_append "<li>$first_names $last_name\n"
}
However, sometimes this is not sufficient: you may need to examine the rows
returned, to dynamically determine the set of columns returned by the query,
or to avoid collisions with existing variables. You can use the
-column_array and -column_set switches to
db_foreach, db_0or1row, and db_1row to
instruct the database routines to place the results in a Tcl array or
ns_set, respectively, where the keys are the column names and
the values are the column values. For example:
db_foreach users_select "select first_names, last_name from users" -column_set columns {
# Now $columns is an ns_set.
doc_body_append "<li>"
for { set i 0 } { $i < [ns_set size $columns] } { incr i } {
doc_body_append "[ns_set key $columns $i] is [ns_set value $columns $i]. \n"
}
}
will write something like:
first_names is Jon. last_name is Salz.first_names is Lars. last_name is Pind.first_names is Michael. last_name is Yoon.Sequence Pooling
The database library can transparently maintain pools of sequence values, so
that each request for a new sequence value (using db_nextval)
does not incur a roundtrip to the server. For instance, this functionality is
very useful in the security/sessions library, which very frequently allocates
values from the sec_id_seq sequence. To utilize this
functionality for a particular sequence, register the sequence to be pooled,
either using the db_register_pooled_sequence procedure at server
startup time, or by including a configuration parameter of the form
PoolSequence.sequence_name_seq=count
in any configuration section in the yourservername.ini
file, e.g., e.g.,
[ns/server/yourservername/acs/security]
PoolSequence.sec_id_seq=20
The database library will allocate this number of sequence values at server
startup. It will periodically scan pools and allocate new values for
sequences which are less than half-full. (This normally occurs every 60
seconds, and is configurable via the
PooledSequenceUpdateInterval parameter in the
[ns/server/yourservername/acs/database] configuration
section.)
API
Note that you never have to use ns_db anymore (including
ns_db gethandle)! Just start doing stuff, and (if you want) call
db_release_unused_handles when you're done as a hint to
release the database handle.
db_abort_transactiondb_abort_transactionAborts all levels of a transaction. That is if this is called within
several nested transactions, all of them are terminated. Use this insetead of
db_dml "abort" "abort transaction".
db_nulldb_nullReturns a value which can be used in a bind variable to represent the SQL
value null. See Nulls and Bind Variables
above.db_foreachdb_foreachstatement-name sql [ -bind bind_set_id | -bind bind_value_list ] \
[ -column_array array_name | -column_set set_name ] \
code_block [ if_no_rows if_no_rows_block ]Performs the SQL query sql, executing
code_block once for each row with variables set to
column values (or a set or array populated if -column_array or
column_set is specified). If the query returns no rows, executes
if_no_rows_block (if provided). Example:
db_foreach select_foo "select foo, bar from greeble" {
doc_body_append "<li>foo=$foo; bar=$bar\n"
} if_no_rows {
doc_body_append "<li>There are no greebles in the database.\n"
}
The code block may contain break statements (which terminate the
loop and flush the database handle) and continue statements
(which continue to the next row of the loop). db_1rowdb_1rowstatement-namesql [ -bind bind_set_id | -bind bind_value_list ] \
[ -column_array array_name | -column_set set_name ]
Performs the SQL query sql, setting variables to
column values. Raises an error if the query does not return exactly 1 row. Example:
db_1row select_foo "select foo, bar from greeble where greeble_id = $greeble_id"
# Bombs if there's no such greeble!
# Now $foo and $bar are set.
db_0or1rowdb_0or1rowstatement-namesql [ -bind bind_set_id | -bind bind_value_list ] \
[ -column_array array_name | -column_set set_name ]
Performs the SQL query sql. If a row is returned,
sets variables to column values and returns 1. If no rows are returned,
returns 0. If more than one row is returned, throws an error. db_nextvaldb_nextvalsequence-nameReturns the next value for the sequence sequence-name (using a
SQL statement like SELECTsequence-name.nextval FROM
DUAL). If sequence pooling is enabled for the sequence, transparently
uses a value from the pool if available to save a round-trip to the database
(see ).
db_register_pooled_sequencedb_register_pooled_sequencesequence-namepool-sizeRegisters the sequence sequence-name to be pooled, with a pool
size of pool-size sequence values (see ).
db_stringdb_stringstatement-namesql [ -default default ] [ -bind bind_set_id | -bind bind_value_list ]
Returns the first column of the result of SQL query
sql. If sql doesn't return a
row, returns default (or throws an error if
default is unspecified). Analogous to
database_to_tcl_string and
database_to_tcl_string_or_null.
db_listdb_liststatement-namesql [ -bind bind_set_id | -bind bind_value_list ]
Returns a Tcl list of the values in the first column of the result of SQL
query sql. If sql doesn't
return any rows, returns an empty list. Analogous to
database_to_tcl_list.
db_list_of_listsdb_list_of_listsstatement-namesql [ -bind bind_set_id | -bind bind_value_list ]
Returns a Tcl list, each element of which is a list of all column values
in a row of the result of SQL query sql. If
sql doesn't return any rows, returns an empty list.
(Analogous to database_to_tcl_list_list.)
db_dmldb_dmlstatement-namesql \
[ -bind bind_set_id | -bind bind_value_list ] \
[ -blobs blob_list | -clobs clob_list |
-blob_files blob_file_list | -clob_files clob_file_list ]
Performs the DML or DDL statement sql. If a length-n list of blobs or clobs is provided, then the SQL
should return n blobs or clobs into the bind variables
:1, :2, ... :n.
blobs or clobs, if specified,
should be a list of individual BLOBs or CLOBs to insert;
blob_files or clob_files, if
specified, should be a list of paths to files containing the data to
insert. Only one of -blobs, -clobs,
-blob_files, and -clob_files may be provided.Example:
db_dml insert_photos "
insert photos(photo_id, image, thumbnail_image)
values(photo_id_seq.nextval, empty_blob(), empty_blob())
returning image, thumbnail_image into :1, :2
" -blob_files [list "/var/tmp/the_photo" "/var/tmp/the_thumbnail"]
This inserts a new row into the photos table, with the contents
of the files /var/tmp/the_photo and
/var/tmp/the_thumbnail in the image and
thumbnail columns, respectively.
db_write_clob,
db_write_blob,
db_blob_get_filedb_write_clobstatement-namesql [ -bind bind_set_id | -bind bind_value_list ]
db_write_blobstatement-namesql [ -bind bind_set_id | -bind bind_value_list ]
db_blob_get_filestatement-namesql [ -bind bind_set_id | -bind bind_value_list ]
Analagous to ns_ora write_clob/write_blob/blob_get_file.
db_release_unused_handlesdb_release_unused_handlesReleases any allocated, unused database handles. db_transactiondb_transactioncode_block [ on_error { code_block } ]
Executes code_block transactionally. Nested
transactions are supported (end transaction is transparently
ns_db dml'ed when the outermost transaction completes). The
db_abort_transaction command can be used to abort all levels of
transactions. It is possible to specify an optional on_error
code block that will be executed if some code in code_block throws
an exception. The variable errmsg will be bound in that scope.
If there is no on_error code, any errors will be propagated. Example:
proc replace_the_foo { col } {
db_transaction {
db_dml "delete from foo"
db_dml "insert into foo(col) values($col)"
}
}
proc print_the_foo {} {
doc_body_append "foo is [db_string "select col from foo"]<br>\n"
}
replace_the_foo 8
print_the_foo ; # Writes out "foo is 8"
db_transaction {
replace_the_foo 14
print_the_foo ; # Writes out "foo is 14"
db_dml "insert into some_other_table(col) values(999)"
...
db_abort_transaction
} on_error {
doc_body_append "Error in transaction: $errmsg"
}
print_the_foo ; # Writes out "foo is 8"
db_resultrowsdb_resultrowsReturns the number of rows affected or returned by the previous
statement.
db_with_handledb_with_handlevarcode_blockPlaces a database handle into the variable var and
executes code_block. This is useful when you don't
want to have to use the new API (db_foreach,
db_1row, etc.), but need to use database handles explicitly. Example:
proc lookup_the_foo { foo } {
db_with_handle db {
return [db_string unused "select ..."]
}
}
db_with_handle db {
# Now there's a database handle in $db.
set selection [ns_db select $db "select foo from bar"]
while { [ns_db getrow $db $selection] } {
set_variables_after_query
lookup_the_foo $foo
}
}
db_nullify_empty_stringdb_nullify_empty_stringstringFor true SQL purists, we provide the convenience function
db_nullify_empty_string, which returns
[db_null] if its string argument is the empty string
and can be used to encapsulate another Oracle quirk:
set baz ""
# Clean out the foo table
#
db_dml unused "delete from foo"
db_dml unused "insert into foo(baz) values('$baz')"
set n_rows [db_string unused "select count(*) from foo where baz is null"]
#
# $n_rows is 1; in effect, the "baz is null" criterion is matching
# the empty string we just inserted (because of Oracle's coercion
# quirk)
To balance out this asymmetry, you can explicitly set baz to
null by writing:
db_dml foo_insert "insert into foo(baz) values(:1)" {[db_nullify_empty_string $baz]}
($Id: db-api.xml,v 1.4.2.1 2002/05/15 23:26:19 vinodk Exp $)