When you say 'XML', you'd probably also say 'Java' -- until now, because a stock Oracle database installation gives you a PL/SQL wrapper with which you can access the XML API's. From now on, after 'Java' you'll say 'PL/SQL'.
To get an idea of the possibilities, I developed a PL/SQL portlet which parses XML files and walks through the results to pick the interesting pieces and displays those to the Portal user. While I can't give you the source code, the examples should give more than enough clues to build your own portlet.
For this article, it would be useful to have some basic understanding of XML, but it's not a strict requirement.
The XML files of which the contents will be parsed and displayed, is created in the RDF dialect. RDF stands for Resource Description Framework and is used by many sites to expose their headlines. One example of such a site is Slashdot. This is a real community site where users can submit links to interesting news on the web. The link is then displayed along with a short description and users of the website can then discuss the news item.
When the submission is accepted by the site owners, a new RDF file is generated which can be found here.
Such an RDF file looks like this: http://slashdot.org/slashdot.rdf
With such a file, we'll have all the information for a portlet which shows the headlines of Slashdot: we have a name, a link and a description to the main site itself. We have the headlines, which we can make hyperlinks pointing directly to the items themselves. We even have a nice image and a URL to submit a searchstring to!
The portlet consists of two parts: one part that actually parses the RDF files and displays the headlines, and a batchjob which retrieves the RDF files in the background. We'll concentrate on the displaying part first.
The brunt of the work is done by two procedures, show() and print_rdf090(). That first one retrieves the current user its preferences (which sites will be displayed in the portlet). It loops over the second one, which does the actual work of displaying the headlines.
The following code displays a piece of the declare part of the procedure. Using the source code of Slashdot's RDF file, you'll probably have a good idea what each variable is meant for.
452 procedure print_rdf090 453 ( p_site_id in rdf_sites.site_id%type 454 ) 455 is 456 cursor c_sit 457 ( b_site_id in rdf_sites.site_id%type 458 ) 459 is 460 select sit.rdf_file 461 from rdf_sites sit 462 where site_id = b_site_id; 463 r_sit c_sit%rowtype; 464 465 l_found boolean; -- To check whether cursor was successful 466 467 l_parser xmlparser.parser; 468 l_xmldoc xmldom.domdocument; 469 l_doc_elt xmldom.domelement; 470 l_root_node xmldom.domnode; 471 l_rdf_childnodes xmldom.domnodelist; 472 l_channel_node xmldom.domnode; 473 l_item_node xmldom.domnode; 474 l_item_childnodes xmldom.domnodelist; 475 l_node xmldom.domnode; -- A temporary DOM node 476 l_title varchar2(4000); 477 l_link varchar2(4000); 478 479 begin 480 open c_sit(p_site_id); 481 fetch c_sit into r_sit; 482 l_found := c_sit%found; 483 close c_sit; 484 485 if not l_found 486 then 487 raise no_data_found; 488 else 489 -- We found a record, but the clob isn't filled. 490 if dbms_lob.getlength(r_sit.rdf_file) = 0 491 then 492 raise no_data_found; 493 end if; 494 end if;
And the following fragment of code will print the name of the website in the form of a hyperlink. The call to wwui_api_portlet.portlet_subheader_color() makes sure that we follow the Portal UI.
495 496 -- Parse document 497 l_parser := xmlparser.newparser; 498 xmlparser.parseclob(l_parser, r_sit.rdf_file); 499 l_xmldoc := xmlparser.getdocument(l_parser); 500 501 -- Get the child nodes of the root node 502 l_doc_elt := xmldom.getdocumentelement(l_xmldoc); 503 l_root_node := xmldom.makenode(l_doc_elt); 504 l_rdf_childnodes := xmldom.getchildnodes(l_root_node); 505 506 -- Print information in <channel> tag 507 l_channel_node := xmldom.item(l_rdf_childnodes, 0); 508 l_node := xmldom.getfirstchild(l_channel_node); -- go to title element 509 l_node := xmldom.getfirstchild(l_node); -- Go to text of <title> 510 l_title := xmldom.getNodeValue(l_node); 511 512 l_node := xmldom.getfirstchild(l_channel_node); -- go to title element 513 l_node := xmldom.getnextsibling(l_node); -- go to link element 514 l_node := xmldom.getfirstchild(l_node); -- Go to text of <link> 515 l_link := xmldom.getNodeValue(l_node); 516 517 -- Print header of the table 518 htp.tableopen('cellspacing=0 cellpadding=0 width="100%"'); 519 htp.p('<tr class="' || wwui_api_portlet.portlet_subheader_color || '">'); 520 wwutl_htp.tabledataopen; 521 htp.anchor(ctext => p(l_title), curl => l_link); 522 wwutl_htp.tabledataclose; 523 htp.tablerowclose; 524 525 -- Print items 526 for i in 2..xmldom.getlength(l_rdf_childnodes) - 1 527 loop 528 htp.tablerowopen; 529 l_item_node := xmldom.item(l_rdf_childnodes, i); -- Get an <item> 530 531 -- Get the text and the link 532 if xmldom.getnodename(l_item_node) = 'item' -- Just to make sure 533 then 534 l_item_childnodes := xmldom.getchildnodes(l_item_node); 535 536 -- Get the <title> tag and get its child, a text node 537 l_node := xmldom.item(l_item_childnodes, 0); 538 l_node := xmldom.getfirstchild(l_node); 539 l_title := xmldom.getNodeValue(l_node); 540 541 -- Get the <link> tag and get its child, a text node 542 l_node := xmldom.item(l_item_childnodes, 1); 543 l_node := xmldom.getfirstchild(l_node); 544 l_link := xmldom.getNodeValue(l_node); 545 546 wwutl_htp.tabledataopen; 547 htp.anchor( ctext => p(l_title), curl => l_link); 548 wwutl_htp.tabledataclose; 549 end if; 550 htp.tablerowclose; 551 end loop; 552 htp.tableclose; 553 554 xmlparser.freeparser(l_parser); 555 end print_rdf090;
In between the declaration of the variables and the above code, a cursor named c_sit is opened and a clob (Character Large OBject) is fetched in the variable r_sit.rdf_file. The clob contains the RDF file of the website we're currently looping over. It was put in the clob by the batchjob, which is discussed in the next section.
Line 497 to 499 parses the clob in a DOM document and the result is assigned to the l_xmldoc variable. This really shows the power of XML; we're not making a hundred calls to substr(), instr(), length() or are pouring out mystifying regular expressions -- we're just using a standard parser to put the XML-file in a standard data structure on which we work on using a standard API (the DOM). A reminder: that standard data structure is the 'infoset', that tree of nodes which represents the XML-file. A node can be a root node, an element node, an attribute node or a text node.
In line 502, we get the document element (which is the first tag of the document) and cast it to a node type. Then in line 504, we use the temporary variable l_node to get the text between the <title> tags. The following illustration shows a part of the tree of nodes which we are strolling down.
Lines 507-515 will now probably speak for themselves and as the comment in the code tells us, the data we assign to l_title and l_link are used to start printing a HTML table. After line 523 folllows a loop which gets the contents of the <item> tags and prints them as rows in the HTML table. The result is then as is shown in the screenshot below. Note that not Slashdot's headlines are displayed here, but FreshMeat and Linux.Com; as you can see, lots of sites use RDF.
We cache the RDF files by creating a batchjob which reads a number of URLs from a table, checks each URL for data and puts that data in clobs. The dbms_job and the utl_http package are used for this.
16 procedure refresh 17 is 18 cursor c_sit 19 ( b_err_limit rdf_sites.errors%type 20 ) 21 is 22 select sit.site_id 23 , sit.name 24 , sit.url 25 , sit.errors 26 , sit.rdf_file 27 from rdf_sites sit 28 where sit.errors <= b_err_limit 29 for update of sit.rdf_file, sit.errors; 30 r_sit c_sit%rowtype; 31 32 l_errors rdf_sites.errors%type; 33 l_rdf_file rdf_sites.rdf_file%type; 34 l_err_limit rdf_sites.errors%type; 35 36 l_req utl_http.req; 37 l_resp utl_http.resp; 38 l_name varchar2(256); 39 l_value varchar2(1024); 40 l_len integer; 41 begin 42 -- Initialize variables 43 l_err_limit := 3; -- TODO get from prefstore 44 utl_http.set_proxy(rdf_gen.get_proxy); 45 utl_http.set_response_error_check(false); 46 dbms_lob.createtemporary(l_rdf_file, true); 47 48 -- Loop through all sites and fetch the XML file 49 for i_sit in c_sit(l_err_limit) 50 loop 51 52 l_req := utl_http.begin_request(i_sit.url); 53 utl_http.set_header(l_req, 'User-Agent', 54 'Mozilla/4.0 (compatible; MSIE 5.0; Windows NT 4.0)'); 55 l_resp := utl_http.get_response(l_req); 56 57 -- Look for clientside error and report it 58 if l_resp.status_code >= 400 59 and l_resp.status_code <= 599 60 then 61 utl_http.end_response(l_resp); 62 63 update rdf_sites 64 set errors = i_sit.errors + 1 65 where current of c_sit; 66 -- Everything went well, save the output for later use 67 else 68 -- First clean out the clob in the table 69 l_len := dbms_lob.getlength(i_sit.rdf_file); 70 if l_len > 0 71 then 72 dbms_lob.erase(i_sit.rdf_file, l_len); 73 dbms_lob.trim(i_sit.rdf_file, 0); 74 l_len := dbms_lob.getlength(i_sit.rdf_file); 75 end if; 76 77 -- Now start grabbing from the page and append it to our temp clob 78 dbms_lob.trim(l_rdf_file, 0); 79 begin 80 loop 81 utl_http.read_line(l_resp, l_value); 82 l_rdf_file := l_rdf_file || l_value; 83 end loop; 84 exception 85 when utl_http.end_of_body then 86 utl_http.end_response(l_resp); 87 end; 88 89 -- Temporary clob filled. Write it to the empty clob in the table. 90 dbms_lob.append(i_sit.rdf_file, l_rdf_file); 91 92 -- Succesful retrieval, so reset the number of errors 93 update rdf_sites 94 set errors = 0 95 where current of c_sit; 96 97 end if; 98 end loop; 99 end refresh;
Since websites change every now and then or are unreachable, the code for the batchjob is made a little robust. When retrieving the RDF file does not succeed, a counter is incremented and at a certain limit (configurable by the Portal Page owner), the URL is skipped and an error message is logged in the Portal log tables.
The section 'The Portlet' explains the structure of the code. Another possibility is not to scoop the data from the RDF using PL/SQL code, but to transform them directly to HTML by using XSLT.
You might be a guru concerning XSLT, but for the newbies: XSLT stands for XML Stylesheet Language for Transformations. It is an XML dialect with which a transformation of a source XML-document to a target can be described. For the target formats XML and HTML, things are made easy but really anything is possible.
We'll show the code below:
822 procedure rdf2html 823 ( p_site_id in rdf_sites.site_id%type 824 ) 825 is 826 cursor c_xsl 827 ( b_name in rdf_xslt.name%type 828 ) 829 is 830 select xsl.xsl_file 831 from rdf_xslt xsl 832 where xsl.name = b_name; 833 r_xsl c_xsl%rowtype; 834 835 cursor c_sit 836 ( b_site_id in rdf_sites.site_id%type 837 ) 838 is 839 select sit.rdf_file 840 from rdf_sites sit 841 where sit.site_id = b_site_id; 842 r_sit c_sit%rowtype; 843 844 l_found boolean; -- To check whether cursor was successful 845 C_XSL_NAME constant varchar2(30) := 'rdf2html'; 846 847 l_parser1 xmlparser.parser; 848 l_parser2 xmlparser.parser; 849 l_rdf_file xmldom.domdocument; 850 l_xsl_file xmldom.domdocument; 851 l_html clob; 852 853 l_stylesheet xslprocessor.stylesheet; 854 l_engine xslprocessor.processor; 856 begin 857 -- Initialize variables 858 dbms_lob.createtemporary(l_html, true); 859 860 -- Get the rdf file 861 open c_sit(p_site_id); 862 fetch c_sit into r_sit; 863 l_found := c_sit%found; 864 close c_sit; 865 866 if not l_found 867 then 868 raise no_data_found; 869 else 870 -- We found a record, but the clob isn't filled. 871 if dbms_lob.getlength(r_sit.rdf_file) = 0 872 then 873 raise no_data_found; 874 end if; 875 end if; 876 877 -- Get the stylesheet 878 open c_xsl('rdf2html'); 879 fetch c_xsl into r_xsl; 880 l_found := c_xsl%found; 881 close c_xsl; 882 883 if not l_found 884 then 885 raise no_data_found; 886 else 887 -- We found a record, but the clob isn't filled. 888 if dbms_lob.getlength(r_xsl.xsl_file) = 0 889 then 890 raise no_data_found; 891 end if; 892 end if; 893 894 -- Parse RDF document 895 l_parser1 := xmlparser.newparser; 896 xmlparser.parseclob(l_parser1, r_sit.rdf_file); 897 l_rdf_file := xmlparser.getdocument(l_parser1); 898 899 -- Parse XSL document 900 l_parser2 := xmlparser.newparser; 901 xmlparser.parseclob(l_parser2, r_xsl.xsl_file); 902 l_xsl_file := xmlparser.getdocument(l_parser2); 903 904 -- Autobots, *transform* !! 905 l_stylesheet := xslprocessor.newstylesheet(l_xsl_file, null); 906 l_engine := xslprocessor.newprocessor; 907 xslprocessor.processxsl(l_engine, l_stylesheet, l_rdf_file, l_html); 908 909 -- Clean up stuff 910 xslprocessor.freestylesheet(l_stylesheet); 911 xslprocessor.freeprocessor(l_engine); 912 xmldom.freedocument(l_xsl_file); 913 xmldom.freedocument(l_rdf_file); 914 xmlparser.freeparser(l_parser2); 915 xmlparser.freeparser(l_parser1); 916 917 -- We now have a clob l_html which has the result of the transformation. 918 dump_clob(l_html); 919 dbms_lob.freetemporary(l_html); 920 end rdf2html;
Two cursors are opened here and the RDF and the XSLT files are thrown in clobs. You have seen the code in lines 894 to 902 before; the clobs are parsed. The interesting lines are 905, 906 and 907: we transform the RDF and put it in the clob l_html. With a loop around it, the clob can be sent to the user its browser by calling htp.p().
A remark: performance isn't tested, but it's definitely smartest to do the parsing in the batchjob and create a datamodel which facilitates the data in the RDF-files. This hasn't been done yet since the code is rather alpha-quality and also because debugging code in a batchjob is a PITA... Note that when this option is chosen, using XSLT doesn't make sense anymore.
Anyway, it should be clear by now that XML and PL/SQL can cooperate quite nicely. For more information, I recommend the book of Steve Muench: http://www.oreilly.com/catalog/orxmlapp/
Steve is Oracle's 'XML evangelist'. And of course, a bare bones explanation can be found on http://otn.oracle.com