10 November 2011

How Reliable are PL/Scope Results? (9403)

The 9 November quiz on PL/Scope asked players to draw conclusions about a package body from the data in the user_identifiers view.

One of our most dedicated players, Iudith Mentzel, spent some time testing out the results one gets from PL/Scope (and queries against user_identifiers) for various uses of labels and GOTOs. I publish her comments below for your consideration.

From Iudith Mentzel

I don't want to object to either the results, which are somewhat "colorful" , or the PL/Scope feature itself, but it looks like the "safe usage" of the feature is at least a little bit "less wider" that one may be (too optimistically) tempted to believe .

While the reasoning behind the answer presented for each choice is completely logical and probably follows the reasoning that the players used, driving categorical conclusions about the code contents by ONLY looking at the data gathered by the PL/SQL Scope feature can sometimes be a little bit dangerous ...

For example:

The LABEL and GOTO issue seems very clear on a first glance, however, here is a small example of what can happen if we "tweak" the code a little bit:
-- amendment to add a GOTO ...
CREATE OR REPLACE PACKAGE BODY plch_pkg
IS
   PROCEDURE do_stuff
   IS
      l_items   DBMS_SQL.number_table;
   BEGIN
      FORALL indx IN 1 .. l_items.COUNT
         UPDATE plch_stuff
            SET amount = l_items (indx);

      GOTO all_done;
      NULL;
      <<all_done>>

      DBMS_OUTPUT.put_line (SYSDATE);      

   END do_stuff;
END plch_pkg;
/

-- there is NO LABEL at all in the result set, though we have a label and a GOTO !!!
SELECT type, usage, name
  FROM user_identifiers 
 WHERE object_name = 'PLCH_PKG'
 ORDER BY 1, 2
/

TYPE               USAGE       NAME
------------------ ----------- ------------------------------
ITERATOR           DECLARATION INDX
ITERATOR           REFERENCE   INDX
PACKAGE            DECLARATION PLCH_PKG
PACKAGE            DEFINITION  PLCH_PKG
PROCEDURE          DECLARATION DO_STUFF
PROCEDURE          DEFINITION  DO_STUFF
SYNONYM            CALL        PLITBLM
SYNONYM            REFERENCE   DBMS_SQL
VARIABLE           DECLARATION L_ITEMS
VARIABLE           REFERENCE   L_ITEMS
VARIABLE           REFERENCE   L_ITEMS

11 rows selected.
This is probably because the PL/SQL Optimizing compiler has removed the "non-effective" stuff ...but if we replace the NULL with some other stuff:
CREATE OR REPLACE PACKAGE BODY plch_pkg
IS
   PROCEDURE do_stuff
   IS
      l_items   DBMS_SQL.number_table;
   BEGIN
      FORALL indx IN 1 .. l_items.COUNT
         UPDATE plch_stuff
            SET amount = l_items (indx);

      GOTO all_done;
      DBMS_OUTPUT.put_line('Some stuff');
      <<all_done>>

      DBMS_OUTPUT.put_line (SYSDATE);      

   END do_stuff;
END plch_pkg;
/
then the LABEL is back, though the code still performs exactly the same as before!
SELECT type, usage, name
  FROM user_identifiers 
 WHERE object_name = 'PLCH_PKG'
 ORDER BY 1, 2
/

TYPE            USAGE       NAME
--------------- ----------- --------------------
FUNCTION        CALL        SYSDATE
ITERATOR        DECLARATION INDX
ITERATOR        REFERENCE   INDX
LABEL           DECLARATION ALL_DONE
LABEL           REFERENCE   ALL_DONE
PACKAGE         DECLARATION PLCH_PKG
PACKAGE         DEFINITION  PLCH_PKG
PROCEDURE       DECLARATION DO_STUFF
PROCEDURE       DEFINITION  DO_STUFF
SYNONYM         CALL        PLITBLM
SYNONYM         REFERENCE   DBMS_OUTPUT
SYNONYM         REFERENCE   DBMS_SQL
SYNONYM         REFERENCE   DBMS_OUTPUT
VARIABLE        DECLARATION L_ITEMS
VARIABLE        REFERENCE   L_ITEMS
VARIABLE        REFERENCE   L_ITEMS

16 rows selected.
Now I add one more label....
CREATE OR REPLACE PACKAGE BODY plch_pkg
IS
   PROCEDURE do_stuff
   IS
      l_items   DBMS_SQL.number_table;
   BEGIN
      FORALL indx IN 1 .. l_items.COUNT
         UPDATE plch_stuff
            SET amount = l_items (indx);

      GOTO all_done;
      NULL;
      <<all_done>>

      DBMS_OUTPUT.put_line (SYSDATE);      

      <<another_label>>
      DBMS_OUTPUT.put_line (SYSDATE);
   END do_stuff;
END plch_pkg;
/
And still no LABEL seen in the output, though we have two labels and one GOTO ...
SELECT type, usage, name
  FROM user_identifiers 
 WHERE object_name = 'PLCH_PKG'
 ORDER BY 1, 2
/

TYPE            USAGE       NAME
--------------- ----------- --------------------
ITERATOR        DECLARATION INDX
ITERATOR        REFERENCE   INDX
PACKAGE         DECLARATION PLCH_PKG
PACKAGE         DEFINITION  PLCH_PKG
PROCEDURE       DECLARATION DO_STUFF
PROCEDURE       DEFINITION  DO_STUFF
SYNONYM         CALL        PLITBLM
SYNONYM         REFERENCE   DBMS_SQL
VARIABLE        DECLARATION L_ITEMS
VARIABLE        REFERENCE   L_ITEMS
VARIABLE        REFERENCE   L_ITEMS

11 rows selected.
However, it all depends on where the label is located, for example:
-- adding still another label, but at the beginning 
CREATE OR REPLACE PACKAGE BODY plch_pkg
IS
   PROCEDURE do_stuff
   IS
      l_items   DBMS_SQL.number_table;
   BEGIN
      <<still_another_label>>
      FORALL indx IN 1 .. l_items.COUNT
         UPDATE plch_stuff
            SET amount = l_items (indx);

      GOTO all_done;
      NULL;
      <<all_done>>

      DBMS_OUTPUT.put_line (SYSDATE);      

      <<another_label>>
      DBMS_OUTPUT.put_line (SYSDATE);
   END do_stuff;
END plch_pkg;
/

-- now the first label only appears ...
SELECT type, usage, name
  FROM user_identifiers 
 WHERE object_name = 'PLCH_PKG'
 ORDER BY 1, 2
/

TYPE               USAGE       NAME
------------------ ----------- ------------------------------
ITERATOR           DECLARATION INDX
ITERATOR           REFERENCE   INDX
LABEL              DECLARATION STILL_ANOTHER_LABEL
PACKAGE            DECLARATION PLCH_PKG
PACKAGE            DEFINITION  PLCH_PKG
PROCEDURE          DECLARATION DO_STUFF
PROCEDURE          DEFINITION  DO_STUFF
SYNONYM            CALL        PLITBLM
SYNONYM            REFERENCE   DBMS_SQL
VARIABLE           DECLARATION L_ITEMS
VARIABLE           REFERENCE   L_ITEMS
VARIABLE           REFERENCE   L_ITEMS

12 rows selected.
Now we have two labels, but still only one LABEL declaration, though they are both "equally uneffective" ...

Below I have two labels, one the target of a GOTO, but still no label appears in the user_identifiers view:
CREATE OR REPLACE PACKAGE BODY plch_pkg
IS
   PROCEDURE do_stuff
   IS
      l_items   DBMS_SQL.number_table;
   BEGIN
      FORALL indx IN 1 .. l_items.COUNT
         UPDATE plch_stuff
            SET amount = l_items (indx);

      GOTO another_label;
      NULL;
      <<all_done>>

      DBMS_OUTPUT.put_line (SYSDATE);      

      <<another_label>>
      DBMS_OUTPUT.put_line (SYSDATE);
   END do_stuff;
END plch_pkg;
/

-- still no label, though here logic does matter !
SELECT type, usage, name
  FROM user_identifiers 
 WHERE object_name = 'PLCH_PKG'
 ORDER BY 1, 2
/

TYPE            USAGE       NAME
--------------- ----------- --------------------
ITERATOR        DECLARATION INDX
ITERATOR        REFERENCE   INDX
PACKAGE         DECLARATION PLCH_PKG
PACKAGE         DEFINITION  PLCH_PKG
PROCEDURE       DECLARATION DO_STUFF
PROCEDURE       DEFINITION  DO_STUFF
SYNONYM         CALL        PLITBLM
SYNONYM         REFERENCE   DBMS_SQL
VARIABLE        DECLARATION L_ITEMS
VARIABLE        REFERENCE   L_ITEMS
VARIABLE        REFERENCE   L_ITEMS

11 rows selected.
The label was effective, but still NOT shown in user_identifiers. The CALL to SYSDATE also NOT shown !!! The reference to DBMS_OUTPUT synonym also NOT shown !!!

I just wanted to emphasize how volatile it is to drive conclusions about source code based ONLY on the results in USER_IDENTIFIERS ...

These results seem to be generated AFTER the compiler optimizes the source code so, at least in some aspects, they may be misleading ...

Regarding the choice that asked about FORALL, though the reasoning behind it seems correct, equally to you and to us, in an after-thought it also can be argued ...and this because using FORALL requires a collection to be used, and that would probably introduce additional data into the USER_IDENTIFIERS result set, whether it is a DBMS_SQL based collection, one based on a locally defined TYPE or even on a type referenced from some other package ...

The output for the sample package shown in the Verification code looks like this:
CREATE OR REPLACE PACKAGE BODY plch_pkg
IS
   PROCEDURE do_stuff
   IS
      l_items   DBMS_SQL.number_table;
   BEGIN
      FORALL indx IN 1 .. l_items.COUNT
         UPDATE plch_stuff
            SET amount = l_items (indx);

      <<all_done>>
      DBMS_OUTPUT.put_line (SYSDATE);
   END do_stuff;
END plch_pkg;
/

-- we see some SYNONYMS in the output, that were not there in the original quiz
SELECT type, usage, name
  FROM user_identifiers 
 WHERE object_name = 'PLCH_PKG'
 ORDER BY 1, 2
/

TYPE               USAGE       NAME
------------------ ----------- ------------------------------
FUNCTION           CALL        SYSDATE
ITERATOR           DECLARATION INDX
ITERATOR           REFERENCE   INDX
LABEL              DECLARATION ALL_DONE
PACKAGE            DECLARATION PLCH_PKG
PACKAGE            DEFINITION  PLCH_PKG
PROCEDURE          DECLARATION DO_STUFF
PROCEDURE          DEFINITION  DO_STUFF
SYNONYM            CALL        PLITBLM
SYNONYM            REFERENCE   DBMS_OUTPUT
SYNONYM            REFERENCE   DBMS_SQL
VARIABLE           DECLARATION L_ITEMS
VARIABLE           REFERENCE   L_ITEMS
VARIABLE           REFERENCE   L_ITEMS

14 rows selected.
A last remark is about deciding whether a variable is defined at the package level or inside a subprogram: I think this can be done (maybe preferably) by checking whether the "parent" (the context owner) of the variable declaration is the PACKAGE, rather than a subprogram, for example:
--check variable context ownership
CREATE OR REPLACE PACKAGE BODY plch_pkg
IS
   g_variable   NUMBER ;

   PROCEDURE do_stuff
   IS
      l_items   DBMS_SQL.number_table;
   BEGIN
      FORALL indx IN 1 .. l_items.COUNT
         UPDATE plch_stuff
            SET amount = l_items (indx);

      <<all_done>>
      DBMS_OUTPUT.put_line (SYSDATE);
   END do_stuff;
END plch_pkg;
/

COLUMN TYPE FORMAT A15
COLUMN NAME FORMAT A20

SELECT var.type, var.usage, var.name, parent.type, parent.usage, parent.name
  FROM user_identifiers  var,
       user_identifiers  parent
 WHERE var.object_name = 'PLCH_PKG'
   AND var.type        = 'VARIABLE'
   AND var.usage       = 'DECLARATION'
   AND parent.object_name = var.object_name
   AND parent.object_type = var.object_type
   AND parent.usage_id    = var.usage_context_id
 ORDER BY 1, 2
/

TYPE            USAGE       NAME                 TYPE            USAGE       NAME
--------------- ----------- -------------------- --------------- ----------- ----------------
VARIABLE        DECLARATION L_ITEMS              PROCEDURE       DEFINITION  DO_STUFF
VARIABLE        DECLARATION G_VARIABLE           PACKAGE         DEFINITION  PLCH_PKG

2 rows selected.
In summary, looking only at the data in USER_IDENTIFIERS, we cannot derive 100% precise (YES/NO) conclusions regarding ALL aspects the code ... Just a few thoughts regarding a tough quiz ... and, as I see, not just for me ...For some reason, it reminds me of the one related to "implicit conversions" in the previous quarter ...

Best Regards,
Iudith

6 comments:

  1. Iudith wonders if the outcome of a PL/Scope analysis is affected by the PL/SQL optimizer. I don't believe this is the case. Try running the script below. You get the same result when the optimization level is 0 (disabled) and 2.

    [Sorry about the poor code formatting. Blogger doesn't allow pre tags in comments. Ridiculous!]

    ALTER SESSION SET plscope_settings='identifiers:all'
    /

    ALTER session set plsql_optimize_level=0
    /

    CREATE TABLE plch_stuff
    (
    amount NUMBER
    , rating INTEGER
    )
    /

    CREATE OR REPLACE PACKAGE plch_pkg
    IS
    PROCEDURE do_stuff;
    END plch_pkg;
    /

    CREATE OR REPLACE PACKAGE BODY plch_pkg
    IS
    PROCEDURE do_stuff
    IS
    l_items DBMS_SQL.number_table;
    BEGIN
    FORALL indx IN 1 .. l_items.COUNT
    UPDATE plch_stuff
    SET amount = l_items (indx);

    GOTO all_done;
    NULL;

    <>
    DBMS_OUTPUT.put_line (SYSDATE);
    END do_stuff;
    END plch_pkg;
    /

    SELECT TYPE, usage, name
    FROM user_identifiers
    WHERE object_name = 'PLCH_PKG'
    ORDER BY 1, 2
    /

    ALTER session set plsql_optimize_level=2
    /

    ALTER PACKAGE plch_pkg COMPILE BODY
    /

    SELECT TYPE, usage, name
    FROM user_identifiers
    WHERE object_name = 'PLCH_PKG'
    ORDER BY 1, 2
    /

    DROP TABLE plch_stuff
    /

    DROP PACKAGE plch_pkg
    /

    ReplyDelete
  2. Hello Steven, All,

    I just checked and the output is the same under the two optimization levels.

    TYPE USAGE NAME
    --------------- ----------- --------------------
    ITERATOR DECLARATION INDX
    ITERATOR REFERENCE INDX
    PACKAGE DECLARATION PLCH_PKG
    PACKAGE DEFINITION PLCH_PKG
    PROCEDURE DECLARATION DO_STUFF
    PROCEDURE DEFINITION DO_STUFF
    SYNONYM CALL PLITBLM
    SYNONYM REFERENCE DBMS_SQL
    VARIABLE DECLARATION L_ITEMS
    VARIABLE REFERENCE L_ITEMS
    VARIABLE REFERENCE L_ITEMS

    11 rows selected.

    All my tests are performed in Oracle 11.1.0.7.0.


    We see however that the output DOES or DOES NOT contain the LABEL rows and also other rows
    like the SYSDATE call or DBMS_OUTPUT reference, subject to how we "play around" with the labels.

    So, even under PLSQL_OPTIMIZE_LEVEL=0, looks like the optimizer still performs some "code analysis" that "filters out" some of the elements from the source code, before generating the user_identifiers data.
    Or, to say it differently, there maybe exist "internal optimizing compiler" actions
    that are not controlled by this parameter.

    Otherwise we are only left with the conclusion that maybe the PL/SQL Scope feature has some serious bugs not yet discovered ?!?

    It would be nice if this feature would indeed analyze the source code exactly "as is",
    without any intervening corrections, optimizations, a.s.o, that way we would indeed have a completely accurate image of our code ... of course if we know exactly how to interpret and how to look after
    what we are willing to check.

    I'd be glad to hear other players opinion and experience.

    Thanks a lot & Best Regards,
    Iudith

    ReplyDelete
  3. I think that it is likely you have uncovered bugs in PL/Scope. I don't think this has anything to do with the optimization process.

    I have sent a note to Bryn Llewellyn with an example of what you have uncovered. I will let you all know what he says.

    ReplyDelete
  4. Hello Steven,

    Wow, that would be far shooting indeed :) :)

    Maybe it's time to have a "permanent representation" of the PL/SQL Challenge team
    at Oracle Headquarters, before they release new features ? ) :)

    This just shows that the software producers themselves are also humans ... they also can err
    just like us, the mere mortals ...

    Thank you so much & Best Regards,
    Iudith

    ReplyDelete
  5. PL/Scope results appear to only include references to identifiers for which the associated declaration has been recorded (I suspect this is because the signature is determined when the declaration is processed). The output from the following sequence of statements demonstrates the affect of not storing PL/Scope details for referenced identifiers:

    ALTER session SET plscope_settings='IDENTIFIERS:NONE'
    /
    CREATE OR REPLACE PROCEDURE plch_plscope_test1 AS
    BEGIN
    RETURN;
    END plch_plscope_test1;
    /
    ALTER SESSION SET PLSCOPE_SETTINGS='IDENTIFIERS:ALL'
    /
    CREATE OR REPLACE PROCEDURE plch_plscope_test2 AS
    BEGIN
    plch_plscope_test1;
    END plch_plscope_test2;
    /
    SELECT *
    FROM sys.user_identifiers
    WHERE object_name LIKE 'PLCH\_PLSCOPE\_TEST%' ESCAPE '\'
    ORDER BY object_name, line, col
    /
    ALTER PROCEDURE plch_plscope_test1 COMPILE
    /
    ALTER PROCEDURE plch_plscope_test2 COMPILE
    /
    SELECT *
    FROM sys.user_identifiers
    WHERE object_name LIKE 'PLCH\_PLSCOPE\_TEST%' ESCAPE '\'
    ORDER BY object_name, line, col
    /
    DROP PROCEDURE plch_plscope_test2
    /
    DROP PROCEDURE plch_plscope_test1
    /


    The following are the output from the queries:

    NAME SIGNATURE TYPE OBJECT_NAME OBJECT_TYPE USAGE USAGE_ID LINE COL USAGE_CONTEXT_ID
    ------------------------------ -------------------------------- ------------------ ------------------------------ ------------- ----------- ---------------------- ---------------------- ---------------------- ----------------------
    PLCH_PLSCOPE_TEST2 75DE7E708891CE1B5F9297A20D839A16 PROCEDURE PLCH_PLSCOPE_TEST2 PROCEDURE DECLARATION 1 1 11 0
    PLCH_PLSCOPE_TEST2 75DE7E708891CE1B5F9297A20D839A16 PROCEDURE PLCH_PLSCOPE_TEST2 PROCEDURE DEFINITION 2 1 11 1

    NAME SIGNATURE TYPE OBJECT_NAME OBJECT_TYPE USAGE USAGE_ID LINE COL USAGE_CONTEXT_ID
    ------------------------------ -------------------------------- ------------------ ------------------------------ ------------- ----------- ---------------------- ---------------------- ---------------------- ----------------------
    PLCH_PLSCOPE_TEST1 2C9891C7A26B1B39EC3B5894664ECE69 PROCEDURE PLCH_PLSCOPE_TEST1 PROCEDURE DEFINITION 2 1 11 1
    PLCH_PLSCOPE_TEST1 2C9891C7A26B1B39EC3B5894664ECE69 PROCEDURE PLCH_PLSCOPE_TEST1 PROCEDURE DECLARATION 1 1 11 0
    PLCH_PLSCOPE_TEST2 75DE7E708891CE1B5F9297A20D839A16 PROCEDURE PLCH_PLSCOPE_TEST2 PROCEDURE DECLARATION 1 1 11 0
    PLCH_PLSCOPE_TEST2 75DE7E708891CE1B5F9297A20D839A16 PROCEDURE PLCH_PLSCOPE_TEST2 PROCEDURE DEFINITION 2 1 11 1
    PLCH_PLSCOPE_TEST1 2C9891C7A26B1B39EC3B5894664ECE69 PROCEDURE PLCH_PLSCOPE_TEST2 PROCEDURE CALL 3 3 3 2

    procedure PLCH_PLSCOPE_TEST2 dropped.
    procedure PLCH_PLSCOPE_TEST1 dropped.


    As can be seen, when PLCH_PLSCOPE_TEST1 is compiled without recording its identifiers, calls to it are also not recorded.

    ReplyDelete
  6. Hello JHall, All,

    This finding is very interesting.

    In fact, when looking at the PL/SCOPE feature, it includes some aspects that "overlap" in a certain meaning with the Oracle regular dependency tracking, that is, the one between database objects which is recorded
    in the USER_DEPENDENCIES and similar views.

    When a procedure calls another procedure using a static procedure call, then this dependency is tracked at both the database level (always) and at PL/SCOPE level (on demand, entirely or partially, as you have observed).

    In my opinion, making the data set recorded for one object (the calling procedure) dependent
    on whether the called procedure has PL/SCOPE data collected or not looks as "kind of a bug".

    Why do I say "kind of" ?
    Because it seems that Oracle has implemented this behavior on purpose.

    The documentation says the following:
    http://download.oracle.com/docs/cd/E11882_01/appdev.112/e25518/adfns_plscope.htm#g1010526

    "The packages STANDARD and DBMS_STANDARD declare and define base types, such as VARCHAR2 and NUMBER, and subprograms such as RAISE_APPLICATION_ERROR. If your database has PL/Scope identifier data for these packages, PL/Scope can track your usage of the identifiers that these packages create."

    I checked in my database and found that both these two standard packages do have PL/SCOPE identifiers recorded.

    The behavior is probably the same for regular schema pl/sql objects.


    I think that this feature will probably become more customizable in the future versions of PL/SQL, allowing for example to collect only some types of the data, instead of the current "all or nothing" approach, in a similar way in which the PLSQL_WARNINGS are customizable up to a single specific warning message.


    Thanks a lot & Best Regards,
    Iudith Mentzel

    ReplyDelete