Oracle SQL
  • LICENSE

SQL: Fast ways to count unique characters in the string

Posted on August 19, 2019 by Sayan Malakshinov Posted in oracle, PL/SQL optimization, query optimizing, SQL

Test data:

create table t_str as
select round(dbms_random.value(1e10,9e10)) str from dual connect by level<=1e5
/

PL/SQL variant:

with
  function ff(s varchar2) return varchar2 
  as
      type avarchars is table of varchar2(100) index by varchar2(1);
      st  avarchars;
      idx varchar2(1);
      res varchar2(10);
      
      function iterate( idx in out nocopy varchar2, arr in out nocopy avarchars) 
         return boolean
      as --pragma inline;
      begin
         if idx is null
            then idx:=arr.first; 
            else idx:=arr.next(idx);
         end if;
         return idx is not null;
      end;  
   begin
     for i in 1..length(s) loop
        st(substr(s,i,1)):=1;
     end loop;
     while iterate(idx,st) loop
        res:=res||idx;
     end loop;
     return res;
   end;

select min(ff(str)) res
from t_str
/

SQL-only variant:

select min(fstr)
from t_str t
     cross apply (
     select listagg(c) within group (order by 1) fstr
     from (select
            distinct substr(t.str, level, 1) c
           from dual
           connect by level <= length(t.str)
          )
     )
/

Timings:

SQL> create table t_str as
  2  select round(dbms_random.value(1e10,9e10)) str from dual connect by level<=1e5
  3  /

Table created.

Elapsed: 00:00:00.55
SQL> with
  2    function ff(s varchar2) return varchar2
  3    as
  4        type avarchars is table of varchar2(100) index by varchar2(1);
  5        st  avarchars;
  6        idx varchar2(1);
  7        res varchar2(10);
  8
  9        function iterate( idx in out nocopy varchar2, arr in out nocopy avarchars)
 10           return boolean
 11        as --pragma inline;
 12        begin
 13           if idx is null
 14              then idx:=arr.first;
 15              else idx:=arr.next(idx);
 16           end if;
 17           return idx is not null;
 18        end;
 19     begin
 20       for i in 1..length(s) loop
 21          st(substr(s,i,1)):=1;
 22       end loop;
 23       while iterate(idx,st) loop
 24          res:=res||idx;
 25       end loop;
 26       return res;
 27     end;
 28
 29  select min(ff(str)) res
 30  from t_str
 31  /

RES
--------------------------------------------------------------
0123

Elapsed: 00:00:00.48
SQL> select min(fstr) res2
  2  from t_str t
  3       cross apply (
  4       select listagg(c) within group (order by 1) fstr
  5       from (select
  6              distinct substr(t.str, level, 1) c
  7             from dual
  8             connect by level <= length(t.str)
  9            )
 10       )
 11  /

RES2
--------------------------------------------------------------
0123

Elapsed: 00:00:01.01

And much easier variant if you need your strings contain digits only:

select min(translate('0123456789', translate('z0123456789','z'||str,'z'), chr(0)))
from t_str
sql
« Thoughts on Big Data SQL SmartScan
DMLs and the Columnar Cache on ADW »
photo Sayan Malakshinov

Oracle ACE Pro Oracle ACE Pro

DEVVYOracle Database Developer Choice Award winner

Oracle performance tuning expert

UK / Cambridge

LinkedIn   Twitter
sayan@orasql.org

Recent Posts

  • CBO and Partial indexing
  • Slow index access “COL=:N” where :N is NULL
  • Where does the commit or rollback happen in PL/SQL code?
  • :1 and SP2-0553: Illegal variable name “1”.
  • ORA exceptions that can’t be caught by exception handler

Recent Comments

  • Oracle SGA 값을 증가 시킬 때 발생 장애 원인 – DBA의 정석 on Example of controlling “direct path reads” decision through SQL profile hints (index_stats/table_stats)
  • Oracle SQL | Oracle diagnostic events — Cheat sheet on Where does the commit or rollback happen in PL/SQL code?
  • Functions & Subqueries | Oracle Scratchpad on Deterministic function vs scalar subquery caching. Part 3
  • Materialized views state turns into compilation_error after refresh - kranar.top - Answering users questions... on Friday prank: select from join join join
  • Exadata Catalogue | Oracle Scratchpad on When bloggers get it wrong – part 1
  • Exadata Catalogue | Oracle Scratchpad on Serial Scans failing to offload
  • lateral join – decorrelation gone wrong – svenweller on Lateral view decorrelation(VW_DCL) causes wrong results with rownum
  • 255 column catalogue | Oracle Scratchpad on Intra-block row chaining optimization in 12.2
  • 255 column catalogue | Oracle Scratchpad on row pieces, 255 columns, intra-block row chaining in details
  • opt_estimate catalogue | Oracle Scratchpad on Correct syntax for the table_stats hint

Blogroll

  • Alex Fatkulin
  • Alexander Anokhin
  • Andrey Nikolaev
  • Charles Hooper
  • Christian Antognini
  • Coskan Gundogar
  • David Fitzjarrell
  • Igor Usoltsev
  • Jonathan Lewis
  • Karl Arao
  • Mark Bobak
  • Martin Bach
  • Martin Berger
  • Neil Chandler
  • Randolf Geist
  • Richard Foote
  • Riyaj Shamsudeen
  • Tanel Poder
  • Timur Akhmadeev
  • Valentin Nikotin

Meta

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org
©Sayan Malakshinov. Oracle SQL