Oracle Logbook

October 17, 2020

Finding what is really consuming Oracle resources, combining top queries.

Filed under: Uncategorized — Helio Dias @ 1:41 pm
Tags: , ,

Top queries can be a bit misleading, since one query is not exactly on top but is consuming all kind of resources together.

When you combining all aspects that matter to you, giving a total rate, about time, CPU, IO, concurrency, memory.

The query below is based on force_matching_signature, and if you prefer, you can replace it to sql_id.

https://github.com/HelioDias/top_queries_oracle/blob/main/sqlarea

with
w0 as (select /*+materialize*/
force_matching_signature,count() number_of_sql_ids, sum(executions) executions,sum(rows_processed) rows_processed,trunc(sum(elapsed_time/1000000)) elapsed_seconds, trunc(max(elapsed_time/decode(executions,0,1,executions))/1000000,4) max_seconds_per_execution, trunc(sum(elapsed_time)/decode(sum(executions),0,1,sum(executions))/1000000,4) total_seconds_per_exec, stddev(elapsed_time/decode(executions,0,1,executions)) std_dev_secs_per_exec, sum(fetches) fetches,sum(parse_calls) parse_calls,sum(disk_reads) disk_reads,sum(direct_writes) direct_writes,sum(direct_reads) direct_reads, sum(buffer_gets) buffer_gets, max(buffer_gets/decode(executions,0,1,executions)) max_buffer_gets_per_exec, trunc(sum(application_wait_time/1000000),4) application_wait_seconds, trunc(sum(concurrency_wait_time/1000000),4) concurrency_wait_seconds, trunc(sum(cluster_wait_time/1000000),4) cluster_wait_seconds, trunc(sum(user_io_wait_time/1000000),4) user_io_wait_seconds, trunc(sum(cpu_time/1000000),4) cpu_seconds from v$sqlarea where force_matching_signature>0 group by force_matching_signature ), w1 as (select w0.,
rank() over(order by cpu_seconds desc) cpu_seq,
rank() over(order by user_io_wait_seconds desc) io_seq,
rank() over(order by cluster_wait_seconds desc) gc_seq,
rank() over(order by concurrency_wait_seconds desc) concurrency_seq,
rank() over(order by max_buffer_gets_per_exec desc) max_buffer_gets_exec_seq,
rank() over(order by buffer_gets desc) buffer_gets_seq,
rank() over(order by total_seconds_per_exec desc) total_seconds_per_exec_seq,
rank() over(order by elapsed_seconds desc) elapsed_seconds_seq,
rank() over(order by number_of_sql_ids desc) number_of_sql_ids_seq,
rank() over(order by executions desc) executions_seq
from w0
),
w2 as (select /*+materialize*/ * from w1 where cpu_seq<11 or io_seq<11 or gc_seq<11 or concurrency_seq<11 or max_buffer_gets_exec_seq<11 or
buffer_gets_seq<11 or total_seconds_per_exec_seq<11 or elapsed_seconds_seq<11 or
number_of_sql_ids_seq<11 or executions_seq<11)
select * from w2 order by cpu_seq+io_seq+gc_seq+concurrency_seq+max_buffer_gets_exec_seq+buffer_gets_seq+total_seconds_per_exec_seq+
elapsed_seconds_seq+number_of_sql_ids_seq
— executions is out of the sort on purpose, since it will be present somehow in other metrics
— if you want give more power to one metric just divide it by something e.g. cpu_seq/10+….
;

Further implementation will include the following dimensions, top last hour, top last day, top last month, an extra weight over queries currently running, amount of Exadata usage.

October 11, 2020

Oracle Exadata vs PostgreSQL “Storage Index”

Filed under: Uncategorized — Helio Dias @ 2:25 pm
Tags: , , , , ,

Is it possible that a tiny PostgreSQL beats Oracle Exadata on realm of Storage Index?
Yes. Exadata took 6 seconds executing a query over a billion rows, where PostgreSQL took less than 2 seconds doing the same query.

This it means that PostgreSQL is better than Exadata?
NO absolutely not, it just means that every tool has it place to shine.
And actually the reasons why this happen was because two factors:
I compared the first execution on both DB, and Exadata Storage Index is not persistent.
And I was aiming to filtering 1440 rows, So PostgreSQL wouldn’t have network issue with it. For sure above 2 millions of filtering Exadata would win, even on first executions.

Let’s clarify that Storage Index just really exists on Oracle, because Exadata have a intermediate tier (cell servers) that avoid sending trash rows to the DBMS layer. What PostgreSQL implement is a Block Range INdex BRIN, that it’s persistent and operate in Min/Max Block like Exadata, and is so small.

PostgreSQL targets the big tables since it BRIN is very very lightweight and so small that you don’t notice that it even exists (As long it is used properly).
On the other hand Exadata implementation is all about reduce the overload on the DB Nodes, having the performance improvement as side effect.

PostgreSQL is about planned things, where you must create in advance the BRINs, where Exadata is all about the unexpectable.

Below are the script for Exadata:
CREATE TABLE BILLION_BY_MINUTE PARALLEL NOLOGGING AS
WITH W0 AS (SELECT /+MATERIALIZE/1 FROM DUAL CASCADE CONNECT BY LEVEL<31623) /*This limiting was due to lack of memory on my autonomous database, so later I autojoin it to produce the billion rows*/
select sysdate-rownum/24/60 EG_DATE,
trunc(dbms_random.value(1,100000)) EG_VALUE
from (SELECT 1 FROM W0,W0 T2 WHERE ROWNUM<=1000000000);

select sum(eg_value) from BILLION_BY_MINUTE where eg_date>sysdate – 1

Execution time: 6.975 seconds — First execution
Execution time: 0.125 seconds — Following executions

PostgreSQL script:

CREATE TABLE BILLION_BY_MINUTE AS
with w0 as (
select * from (values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))
as q (c1))
,w1 as (select 1 from w0,w0 as t2,w0 as t3)
,w2 as (select 1 from w1,w1 as t2,W1 AS T3)
,w3 as (select tempo – linha interval ‘1 MINUTE’ as EG_DATE,trunc(random()*1000000) EG_value
from (select localtimestamp as tempo,row_number() over() as linha
from w2) tb1)
SELECT * FROM W3

create index BILLION_BY_MINUTE_brix on BILLION_BY_MINUTE using brin(eg_date);

select sum(eg_value) from BILLION_BY_MINUTE where eg_date>now() – interval ‘1 day’;


October 7, 2020

Improving the performance with RowId (3 to 6 times)

Filed under: Uncategorized — Helio Dias @ 5:41 am
Tags: , ,

Every access that your application does using primary keys needs to navigate on the index structure among the multiple levels (often between 2 and 4) and after that, very often one extra lookup on the table.
So generally speaking, we do 3 readings to access the data on the table.

Accessing the data with PK is like going to a place using their address, and with RowId is like having the precise GPS location.

With RowId we can access it with one single read.
On the application side with have some scenarios that often don’t care about this benefit e.g. a screen where just one row will be updated, and others more dramatic cases, where a batch of information will be changed.
Even for the single row update, using RowId can give an overall improvement on the throughput, since we can have thousands of users doing “just one row update”

To implement it, the application must bring the RowId together with general selects that will fill their local objects.

But have a catch on this, imagine after you load in memory your objects with it’s rowids, another application delete a row and then insert another row in the exactly same place, the rowid would still be same, but the data is now something completely unrelated.
And that’s why some developers don’t use rowids on their applications.

To overcome the consistence issue, it’s actually pretty straight forward, on the updates, the application must use RowID and the PK Together, then Oracle will use the RowId to find the data, and the PK to validate if the data still the same.

September 26, 2020

No more “snapshot too old” on Oracle (ora-01555)

Filed under: Uncategorized — Helio Dias @ 11:02 am

I use to joke around that the way to avoid this error is migrate to other RDBMS. but it’s really just joke.

Snapshot too old is not a bad implementation thing that just happen on Oracle databases, instead it’s a side effect of one the most elegant solution to deal with high concurrency environments.

Complaining that “Snapshot too old” it’s an Oracle issue, it’s like saying that spaceships are bad because they have lack of oxygen issues. The only reason why you know that you don’t have air in out space is because the spaceship allowed you to be there in the first place.

There are some ways to the users/DBAs deal with it, and I have an idea how Oracle could redesign an improvement for it, that in some cases would allow queries to run for months, without any Snapshot too old.

The key problem is that Oracle creates and keep Undo for everything, not just for the tables that you need for your longer query or the duration of their transaction, and deep down it’s a very good solution, because avoid many latches control on objects of queries that after some seconds or minutes it will finished anyway.

How Oracle could minimize or almost get rid of the issue, without having to control each query that starts on the database?
Oracle could monitor the Undo available, and at some point let’s say 50% of Undo, Oracle could evaluate the longest query against the second longest in execution and release the Undo for all blocks that belongs to the oldest SCN but are fresher the second longest, and of course belong to objects that the longest query are not using.

This way Oracle would get ride of large amount of Undo usage, since often one database have just one gigantic query in place.
But what if they have a secondary very long query? Oracle simple would repeat the process comparing the 2nd and 3rd query Objects/SCNs and so on, limiting to transactions longer that 1 hour (or some parameter to control this limit as well).

The advantage of my solution is that if the customer DB never come close to the threshold (let’s say 50% Undo), your database would never have any impact.

Currently the way to deal with it are:

1 – Improve your query speed (that majority of errors comes because a long running query, is making Oracle keep Undo since the beginning of the query.

2 – Increase the size of Undo tablespace and the retention period (recently versions allows you to even force the retention, in the past was just a goal)

3 – Avoid huge transaction, e.g. deleting at once a table that uses more space than your Undo.

4 – Using truncate instead of delete, even for partial deletion. e.g. you want to keep 10% of the table, create a stage table, insert what you want to keep there. truncate the main table, insert it back, and truncate the stage table. (Must consider the amount of keepers and also if the application will survive without the keepers for a while)

September 18, 2020

Cloud Exadata for free

Filed under: Uncategorized — Helio Dias @ 9:01 pm

Have you ever wonder, how it feels to be “piloting” an Oracle Exadata?

I started using daily Exadata since 2015, and can assure you, feels like being in a rocket, and now Oracle provide a lifetime Oracle Exadata for free, everyone should have one, not just for testing, but maybe to do some real hard calculations, I myself have 2 personal free cloud Exadata.

Just for the sake of comparison, I did a 10 billion rows average query on PostgreSQL, and it took 3 hours. On Exadata a 6 Billion rows average took 23 SECONDS.

The Oracle table had 161 GB.

select avg(lo_quantity) from ssb.lineorder;
25.49

Duration: 23 seconds

Maybe this column have tons of nulls?

Let’s check:

select COUNT(*) from ssb.lineorder WHERE LO_QUANTITY IS NOT NULL;

5999989709

🙂 no it is really just the power of Exadata.

There are indeed some limitation with this Cloud Exadata Free Lifetime, But the key aspect is that it brings this “lineorder” sample table with 161GB, which you can create your own tables around, then join with it, and see how much Exadata can do for your company.

May 21, 2014

Enqueue Waits in Oracle Database 10g

Filed under: Uncategorized — Helio Dias @ 8:04 pm

Alex Zeng's Blog

Enqueue waits events usually causes problems in busy Oracle databases. But oracle online document only have some of them until 11gR2, not sure why. I paste these information here for quick reference.

In Oracle Database 10g Release 1, each enqueue type is represented by its own wait event, making it much easier to understand exactly what type of enqueue the session is waiting for. You do not need to decipher the values from the P1, P2, P3, P1RAW, P2RAW, and P3RAW columns in the V$SESSION_WAIT or the V$SESSION view.

The following table lists all the enqueue waits in Oracle Database 10g Release 1 and describes what the enqueue is for. This information is available in the X$KSQST structure. The aggregated statistics for each of these enqueue types is displayed by the view V$ENQUEUE_STAT.

Enqueue Type Description
enq: AD – allocate AU Synchronizes accesses to a specific OSM (Oracle…

View original post 4,363 more words

April 1, 2014

Over-Configuring DBWR Processes – Part II.

Filed under: Uncategorized — Helio Dias @ 3:01 pm

Kevin Closson's Blog: Platforms, Databases and Storage

In my recent blog entry about over-configuring DBWR processes, I mentioned the processor cache effect of having too many DBWR processes doing work that a single DBWR process could handle. I’d like to provide more detail on the matter.

Backgrounder
Few people really understand what DBWR does for a living. Yes, everyone knows DBWR is responsible for flushing modified buffers from the SGA, but that is really high level. DBWR:

  • 1. Builds “write batches” by examining the state of buffer headers on the LRU lists. It scans the LRU end considering the age of the buffer. At instance boot time, each DBWR process is assigned a set of LRUs to tend to.
  • 2. Manipulates cache buffers chains elements for such reasons as marking the buffer busy when it is in flight for an I/O and marking it as written after the flush operation.
  • 3. Performs I/O using the OSDs…

View original post 1,755 more words

February 28, 2014

How to use OLAP inside an Oracle update

Filed under: Uncategorized — Helio Dias @ 2:48 pm

Yesterday a client told me she couldn´t find any example of this on google,

It really surprised me, so I decided produce an example of this:

UPDATE
APOIO4 B
SET B.COL_VAL = 10
WHERE ROWID IN
(SELECT ROWID FROM (SELECT ROWID,RANK() OVER (PARTITION BY COLPK1, COLPK2
ORDER BY COL4 DESC, COL5) SEQUENCIA
FROM APOIO4
WHERE COLFILTER1=’VD902′
AND COLFILTER2 = ‘100161’) WHERE SEQUENCIA=1);

 

June 30, 2013

Query to find out the ideal size for redo log

Filed under: Uncategorized — Helio Dias @ 4:06 am

It´s well know among DBAs that the redo log must few times  as possible, some say  that is more than 4 times per hour others says 2.

Now one must do is run the below query, adjusting the interval of switch (the 15 number) and voila, you will get a projection

of redo size per instance and group.

 

select Instancia “Instance”,g1 Group ,trunc(bytes/1024/1024/round((tempo2-tempo1)*60*24)*15,2) “Redo MB Projected for 15 switch”,bytes/1024/1024 “Actual Redo MB” from ( select t1.thread# Instancia,t1.group# g1,t1.first_time tempo1,min(t2.first_time) tempo2 from v$log t1,v$log t2 where t1.thread#=t2.thread# and t1.group#<>t2.group# and t2.first_time>t1.first_time group by t1.thread#,t1.group#,t1.first_time ) t1,v$log t2 where Instancia=thread# and g1=group# Order by 1,3;

If you want to improve this query, change the v$log to archives view, than you will have a wide view.

Hope it help.

May 13, 2013

Benefits of Oracle Block Change Tracking

Filed under: Uncategorized — Helio Dias @ 8:53 pm

Regarding of amount of block change and the size of your database, one can have a huge decrease in backup time.

To activate:

alter database enable block change tracking using file ‘/u02/oracle/bct/bct.log’;

To following it:

SELECT filename, status, bytes
FROM v$block_change_tracking;

Next Page »

Blog at WordPress.com.