package XML::ValidWriter ;
=head1 NAME
XML::ValidWriter - DOCTYPE driven valid XML output
=head1 SYNOPSIS
## As a normal perl object:
$writer = XML::ValidWriter->new(
DOCTYPE => $xml_doc_type,
OUTPUT => \*FH
) ;
$writer->startTag( 'b1' ) ;
$writer->startTag( 'c2' ) ;
$writer->end ;
## Writing to a scalar:
$writer = XML::ValidWriter->new(
DOCTYPE => $xml_doc_type,
OUTPUT => \$buf
) ;
## Or, in scripting mode:
use XML::Doctype NAME => a, SYSTEM_ID => 'a.dtd' ;
use XML::ValidWriter qw( :all :dtd_tags ) ;
b1 ; # Emits
c2( attr=>"val" ) ; # Emits
endAllTags ; # Emits
## If you've got an XML::Doctype object handy:
use XML::ValidWriter qw( :dtd_tags ), DOCTYPE => $doctype ;
## If you've saved a preparsed DTD as a perl module
use FooML::Doctype::v1_0001 ;
use XML::ValidWriter qw( :dtd_tags ) ;
#
# This all assumes that the DTD contains:
#
#
#
#
#
#
=head1 STATUS
Alpha. Use and patch, don't depend on things not changing drastically.
Many methods supplied by XML::Writer are not yet supplied here.
=head1 DESCRIPTION
This module uses the DTD contained in an XML::Doctype to enable compile-
and run-time checks of XML output validity. It also provides methods and
functions named after the elements mentioned in the DTD. If an
XML::ValidWriter uses a DTD that mentions the element type TABLE, that
instance will provide the methods
$writer->TABLE( $content, ...attrs... ) ;
$writer->start_TABLE( ...attrs... ) ;
$writer->end_TABLE() ;
$writer->empty_TABLE( ...attrs... ) ;
. These are created for undeclared elements--those elements not explicitly
declared with an declaration--as well. If an element
type name conflicts with a method, it will not override the internal method.
When an XML::Doctype is parsed, the name of the doctype defines the root
node of the document. This name can be changed, though, see L
for details.
In addition to the object-oriented API, a function API is also provided.
This allows you to import most of the methods of XML::ValidWriter as functions
using standard import specifications:
use XML::ValidWriter qw( :all ) ; ## Could list function names instead
C<:all> does not import the functions named after elements mentioned in
the DTD, you need to import those tags using C<:dtd_tags>:
use XML::Doctype NAME => 'foo', SYSTEM_ID => 'fooml.dtd' ;
use XML::ValidWriter qw( :all :dtd_tags ) ;
or
BEGIN {
$doctype = XML::Doctype->new( ... ) ;
}
use XML::ValidWriter DOCTYPE => $doctype, qw( :all :dtd_tags ) ;
=head2 XML::Writer API compatibility
Much of the interface is patterned
after XML::Writer so that it can possibly be used as a drop-in
replacement. It will take awhile before this module emulates enough
of XML::Writer to be a drop-in replacement in situations where the
more advanced XML::Writer methods are used. If you find you need
a method not suported here, write it and send it in!
This was not derived from XML::Writer because XML::Writer does not
expose it's stack. Even if it did, it's might be difficult to store
enough state in it's stack.
Unlike XML::Writer, this does not call in all of the IO::* family, and
method dispatch should be faster. DTD-specific methods are also supported
(see L).
=head2 Quick and Easy Unix Filter Apps
For quick applications that provide Unix filter application
functionality, XML::ValidWriter and XML::Doctype cooperate to allow you
to
=over
=item 1
Parse a DTD at compile-time and set that as the default DTD for
the current package. This is done using the
use XML::Doctype NAME => 'FooML, SYSTEM_ID => 'fooml.dtd' ;
syntax.
=item 2
Define and export a set of functions corresponding to start and end tags for
all declared and undeclared ELEMENTs in the DTD. This is done by using
the C<:dtd_tags> export symbol like so:
use XML::Doctype NAME => 'FooML, SYSTEM_ID => 'fooml.dtd' ;
use XML::ValidWriter qw(:dtd_tags) ;
If the elements a, b_c, and d-e are referred to in the DTD, the following
functions will be exported:
a() end_a() # like startTag( 'a', ... ) and endTag( 'a' )
b_c() end_b_c()
d_e() end_d_e() {'d-e'}() {'end_d-e'}()
These functions emit only tags, unlike the similar functions found
in CGI.pm and XML::Generator, which also allow you to pass content
in as parameters.
See below for details on conflict resolution in the mapping of entity
names containing /\W/ to Perl subroutine names.
If the elements declared in the DTD might conflict with functions
in your package namespace, simple put them in some safe namespace:
package FooML ;
use XML::Doctype NAME => 'FooML', SYSTEM_ID => 'fooml.dtd' ;
use XML::ValidWriter qw(:dtd_tags) ;
package Whatever ;
The advantage of importing these subroutine names is that perl
can then detect use of unknown tags at compile time.
If you don't want to use the default DTD, use the C<-dtd> option:
BEGIN { $dtd = XML::Doctype->new( .... ) }
use XML::ValidWriter qw(:dtd_tags), -dtd => \$dtd ;
=item 3
Use the default DTD to validate emitted XML. startTag() and endTag()
will check the tag being emitted against the list of currently open
tags and either emit a minimal set of missing end and start tags
necessary to achieve document validity or produce errors or warnings.
Since the functions created by the C<:dtd_tags> export symbol are wrappers
around startTag() and endTag(), they provide this functionality as well.
So, if you have a DTD like
you can do this:
use XML::Doctype NAME => 'a', SYSTEM_ID => 'a.dtd' ;
use XML::ValidWriter ':dtd_tags' ;
getDoctype->element_decl('a')->attdef('aa1')->default_on_write('foo') ;
a ;
b1 ;
c1 ;
end_c1 ;
end_b1 ;
b3 ;
c3( -attr => val ) ;
end_c3 ;
end_b3 ;
end_a ;
and emit a document like
"val" />
.
=back
=head1 OUTPUT OPTIMIZATION
XML is a very simple langauge and does not offer a lot of room for
optimization. As the spec says "Terseness in XML markup is of
minimal importance." XML::ValidWriter does optimize the following
on output:
Ca...EE/aE> becomes 'Ea... />'
Spurious emissions of C<]]EE![CDATA[> are supressed.
XML::ValidWriter chooses whether or not to use a section
or simply escape '<' and '&'. If you are emitting content for
an element in multiple
calls to L, the first call decides whether or not to use
CDATA, so it's to your advantage to emit as much in the first call
as possible. You can do
characters( @lots_of_segments ) ;
if it helps.
=cut
use strict ;
use vars qw( $VERSION @ISA @EXPORT_OK %EXPORT_TAGS ) ;
use fields (
'AT_BOL', # Set if the last thing emitted was a "\n".
'CDATA_END_PART', # ']' or ']]' if we're in CDATA mode and the last parm
# to the last call to characters() ended in this.
'CHECKED_XML_DECL',
'FILE_NAME', # set if the constructor received OUTPUT => 'foo.barml'
'CREATED_AT', # File and line number the instance was created at
'DATA_MODE', # Whether or not to be in data mode
'DOCTYPE', # The parsed DOCTYPE & DTD
'EMITTED_DOCTYPE',
'EMITTED_ROOT',
'EMITTED_XML',
'IS_STANDALONE',
'METHODS', # Cache of AUTOLOADed methods
'OUTPUT', # The output filehandle
'STACK', # The array of open elements
'SHOULD_WARN', # Turns on warnings for things that should (but may not be)
# the case, like emitting ''. defaults to '1'.
'WAS_END_TAG', # Set if last thing emitted was an empty tag or an end tag
'STRAGGLERS', # '>' if we just emitted a start tag, ']]>' if
## If it's a reference to anything but a plain old hash, then the
## first param is either an XML::ValidWriter, a reference to a glob
## a reference to a SCALAR, or a reference to an IO::Handle.
return shift if ( @_ && ref $_[0] && isa( $_[0], 'XML::ValidWriter' ) ) ;
my $callpkg = caller(1) ;
croak "No default XML::ValidWriter declared for package '$callpkg'"
unless $pkg_writers{$callpkg} ;
return $pkg_writers{$callpkg} ;
}
=head1 METHODS AND FUNCTIONS
All of the routines in this module can be called as either functions
or methods unless otherwise noted.
To call these routines as functions use either the DOCTYPE or
:dtd_tags options in the parameters to the use statement:
use XML::ValidWriter DOCTYPE => XML::Doctype->new( ... ) ;
use XML::ValidWriter qw( :dtd_tags ) ;
This associates an XML::ValidWriter and an XML::Doctype with the
package. These are used by the routines when called as functions.
=over
=item new
$writer = XML::ValidWriter->new( DTD => $dtd, OUTPUT => \*FH ) ;
Creates an XML::ValidWriter.
The value passed for OUTPUT may be:
=over
=item a SCALAR ref
if you want to direct output to append to a scalar. This scalar is
truncated whenever the XML::ValidWriter object is reset() or
DESTROY()ed
=item a file handle glob ref or a reference to an IO object
XML::ValidWriter does not load IO. This is
the only mode compatible with XML::Writer.
=item a file name
A simple scalar is taken to be a filename to be created or truncated
and emitted to. This file will be closed when the XML::ValidWriter object
is reset or deatroyed.
=back
NOTE: if you leave OUTPUT undefined, then the currently select()ed
output is used at each emission (ie calling select() can alter the
destination mid-stream). This eases writing command line filter
applications, the select() interaction is unintentional, and please
don't depend on it. I reserve the right to cache the select()ed
filehandle at creation time or at time of first emission at some
point in the future.
=cut
sub new {
my $class = shift ;
$class = ref $class || $class ;
my XML::ValidWriter $self ;
{
no strict 'refs' ;
$self = bless [ \%{"$class\::FIELDS"} ], $class ;
}
$self->{SHOULD_WARN} = 1 ;
while ( @_ ) {
for my $parm ( shift ) {
if ( $parm eq 'DOCTYPE' ) {
croak "Can't have two DOCTYPE parms"
if defined $self->{DOCTYPE} ;
$self->{DOCTYPE} = shift ;
}
elsif ( $parm eq 'OUTPUT' ) {
croak "Can't have two OUTPUT parms"
if defined $self->{OUTPUT} || defined $self->{FILE_NAME} ;
if ( ref $_[0] ) {
$self->{OUTPUT} = shift ;
}
else {
$self->{FILE_NAME} = shift ;
}
}
}
}
## Find the original caller
my $caller_depth = 1 ;
++$caller_depth
while caller && isa( scalar( caller $caller_depth ), __PACKAGE__ ) ;
$self->{CREATED_AT} = join( ', ', (caller( $caller_depth ))[1,2] );
$self->reset ;
return $self ;
}
=item import
Can't think of why you'd call this method directly, it gets called
when you use this module:
use XML::ValidWriter qw( :all ) ;
In addition to the normal functionality of exporting functions like
startTag() and endTag(), XML::ValidWriter's import() can create
functions corresponding to all elements in a DTD. This is done using
the special C<:dtd_tags> export symbol. For example,
use XML::Doctype NAME => 'FooML', SYSTEM_ID => 'fooml.dtd' ;
use XML::ValidWriter qw( :dtd_tags ) ;
where fooml.dtd referse to a tag type of 'blurb' causes these
functions to be imported:
blurb() # calls defaultWriter->startTag( 'blurb', @_ ) ;
blurb_element() # calls defaultWriter->dataElement( 'blurb', @_ ) ;
empty_blurb() # calls defaultWriter->emptyTag( 'blurb', @_ ) ;
end_blurb() # calls defaultWriter->endTag( 'blurb' ) ;
The range of characters for element types is much larger than
the range of characters for bareword perl subroutine names, which
are limited to [a-zA-Z0-9_]. In this case, XML::ValidWriter will
export an oddly named function that you can use a symbolic reference
to call (you will need C if you are doing
a C