Got an interesting question today in RuOUG:
Some very simple PL/SQL procedures usually are completed within ~50ms, but sometimes sporadically longer than a second. For example, the easiest one from these procedures:
create or replace PROCEDURE XXXX (
P_ORG_NUM IN number,
p_result OUT varchar2,
p_seq OUT number
) AS
BEGIN
p_seq := P_ORG_NUM; p_result:='';
END;
sql_trace shows that it was executed for 1.001sec and all the time was “ON CPU”:
Trace file /opt/oracle/diag/rdbms/rdb4/rdb/trace/rdb_ora_7100.trc
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.10.0.0.0
Build label: RDBMS_19.10.0.0.0DBRU_SOLARIS.SPARC64_210107
ORACLE_HOME: /opt/oracle/product/19
System name: SunOS
Node name: node2.zzzzz.ru
Release: 5.11
Version: 11.3
Machine: sun4v
Oracle process number: 137
Unix process pid: 7100, image: oracle@node2.zzzzz.ru
*** 2021-05-26T15:38:31.321550+03:00
*** SESSION ID:(68.47940) 2021-05-26T15:38:31.321609+03:00
*** CLIENT ID:() 2021-05-26T15:38:31.321633+03:00
*** SERVICE NAME:(SYS$USERS) 2021-05-26T15:38:31.321656+03:00
*** MODULE NAME:(JDBC Thin Client) 2021-05-26T15:38:31.321679+03:00
*** ACTION NAME:() 2021-05-26T15:38:31.321703+03:00
*** CLIENT DRIVER:(jdbcthin : 19.3.0.0.0) 2021-05-26T15:38:31.321728+03:00
=====================
PARSING IN CURSOR #18446744071368904384 len=53 dep=0 uid=51 oct=47 lid=51 tim=190436799743741 hv=804128640 ad='12345678' sqlid='aabbaabbaa123'
BEGIN AAAA.XXXX(:1 ,:2 ,:3 ); END;
END OF STMT
PARSE #18446744071368904384:c=28,e=28,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=0,tim=190436799743741
BINDS #18446744071368904384:
Bind#0
oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
oacflg=03 fl2=1000000 frm=01 csi=873 siz=24 off=0
kxsbbbfp=ffffffff747c6c98 bln=22 avl=02 flg=05
value=6
Bind#1
oacdty=01 mxl=32767(32767) mxlc=00 mal=00 scl=00 pre=00
oacflg=03 fl2=1400010 frm=01 csi=873 siz=32767 off=0
kxsbbbfp=ffffffff747b8020 bln=32767 avl=00 flg=05
Bind#2
oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
oacflg=03 fl2=1400000 frm=01 csi=873 siz=24 off=0
kxsbbbfp=ffffffff747c6c68 bln=22 avl=00 flg=05
*** 2021-05-26T15:38:33.172899+03:00
WAIT #18446744071368904384: nam='SQL*Net message to client' ela= 3 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=190436800744970
EXEC #18446744071368904384:c=1104,e=1001110,p=0,cr=0,cu=0,mis=0,r=1,dep=0,og=1,plh=0,tim=190436800745125
XCTEND rlbk=0, rd_only=1, tim=190436800745192
Originally I asked for dtrace/flamegraph and got just this:
unix`_resume_from_idle+0x280
genunix`cv_timedwait_sig_hires+0x1c4
genunix`cv_waituntil_sig+0x90
genunix`nanosleep+0xd4
unix`syscall_trap+0x228
0xffffffff7c04dbcc
0xffffffff7c0393ec
0xffffffff7c02e084
0xffffffff7c02f860
0x1176ef724
0x118376518
0x118366b84
0x118365d14
0x118366078
0x118314b60
0x1183973cc
0x1182c707c
0x1102edef8
0x10aa60064
0x10c224038
0x10aa50ca8
0x115bbc370
0x10aa196dc
0x10aa2c59c
0x10aa50ca8
ns
value ------------- Distribution ------------- count
268435456 | 0
536870912 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1
1073741824 | 0
But truss
showed much more interesting details:
0.000037 write(11, " B i n d # 2\n", 8) = 8
0.000045 write(12, " 2 0 m 7 } M X 8\n", 9) = 9
0.000022 lseek(11, 0, SEEK_CUR) = 4673
0.000038 write(11, " o a c d t y = 0 2 ".., 104) = 104
0.000040 write(12, " 2 0 m N } 5 Y e 1\n", 10) = 10
0.000020 lseek(11, 0, SEEK_CUR) = 4777
0.000037 write(11, " k x s b b b f p = f".., 52) = 52
0.000038 write(12, " 2 0 m 7 } j Y q\n", 9) = 9
0.000028 pathconf("/tmp/", _PC_NAME_MAX) = 255
0.000029 open("/tmp/.SHMLPESLD_rdb4_26_000000047334dc80", O_RDWR|O_CREAT|O_EXCL|O_NOFOLLOW|O_CLOEXEC, 0666) Err#17 EEXIST
1.000029 nanosleep(0xFFFFFFFF7FFF5FF0, 0xFFFFFFFF7FFF5FE0) = 0
0.000038 open("/tmp/.SHMLPESLD_rdb4_26_000000047334dc80", O_RDWR|O_CREAT|O_EXCL|O_NOFOLLOW|O_CLOEXEC, 0666) = 13
0.000023 close(13) = 0
0.000026 open("/tmp/.SHMDPESLD_rdb4_26_000000047334dc80", O_RDONLY|O_NOFOLLOW|O_CLOEXEC) = 13
0.000028 unlink("/tmp/.SHMLPESLD_rdb4_26_000000047334dc80") = 0
0.000052 mmap(0x00000000, 528, PROT_READ|PROT_EXEC, MAP_PRIVATE, 13, 0) = 0xFFFFFFFF74700000
0.000020 close(13) = 0
0.000023 lseek(11, 0, SEEK_CUR) = 4829
0.000044 write(11, "\n * * * 2 0 2 1 - 0 5".., 38) = 38
0.000040 write(12, " - T + 7 } d b k t . c *".., 16) = 16
0.000020 lseek(11, 0, SEEK_CUR) = 4867
0.000037 write(11, " W A I T # 1 8 4 4 6 7".., 130) = 130
0.000037 write(12, " 8 4 j U 1 y 5 $ o 7 2 2".., 18) = 18
0.000020 lseek(11, 0, SEEK_CUR) = 4997
0.000037 write(11, " E X E C # 1 8 4 4 6 7".., 105) = 105
0.000037 write(12, " 8 0 j d 0 y 5 $ k x s t".., 20) = 20
0.000022 lseek(11, 0, SEEK_CUR) = 5102
As you can see it calls nanosleep
in the line #11 and sleeps 1 sec, but more interesting a line before and a few after nanosleep
: oracle tries to open (and fails “err17”) a file "/tmp/.SHMLPESLD_rdb4_XXX
” , which means that it tries to open a pl/sql procedure compiled in native mode (pesld – Plsql code Execution Shared object manager native code Loader). I’ve asked to check compilation parameters of the procedure, and it was really compiled in native mode:
PLSQL_OPTIMIZE_LEVEL 2
PLSQL_CODE_TYPE NATIVE
PLSQL_DEBUG FALSE
So obvious workaround to recompile those procedures in the INTERPRETED
mode fixed the issue.
UPDATE: Forgot to show what exactly is going on here: Finally I got normal stack:
unix`_resume_from_idle+0x280
genunix`cv_timedwait_sig_hires+0x1c4
genunix`cv_waituntil_sig+0x90
genunix`nanosleep+0xd4
unix`syscall_trap+0x228
libc.so.1`__nanosleep+0x4
libc.so.1`sleep+0x14
libc.so.1`__pos4obj_lock+0x68
libc.so.1`shm_open+0x50
oracle`pesld02_Map_Object+0xbc4
oracle`penglur+0x1d18
oracle`pfrins+0x8e4
oracle`pfri7_inst_body_common+0x214
oracle`pfri3_inst_body+0x18
oracle`pfrrun+0x9e0
oracle`plsql_run+0x22c
oracle`peicnt+0xfc
oracle`kkxexe+0x2f8
oracle`opiexe+0x5784
oracle`kpoal8+0x918
oracle`opiodr+0x448
oracle`ttcpip+0x430
oracle`opitsk+0x83c
oracle`opiino+0x39c
oracle`opiodr+0x448
ns
value ------------- Distribution ------------- count
268435456 | 0
536870912 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 3
1073741824 | 0
As you can see pesld02_Map_Object
is trying to load (map) a user-defined function (native compilation) and calls shm_open
which calls __pos4obj_lock
and if we look at it, we can find why and when it calls sleep(which calls nanosleep
internally):
int
__pos4obj_lock(const char *name, const char *ltype)
{
char *dfile;
int fd;
int limit = 64;
if ((dfile = __pos4obj_name(name, ltype)) == NULL) {
return (-1);
}
while (limit-- > 0) {
if ((fd = __open_nc(dfile, O_RDWR | O_CREAT | O_EXCL, 0666))
< 0) {
if (errno != EEXIST)
break;
(void) sleep(1);
continue;
}
(void) __close_nc(fd);
free(dfile);
return (1);
}
free(dfile);
return (-1);
}