Problem:

After deleting 52 million rows from a table, the client performed a VACUUM FULL and REINDEX on the table. Following these maintenance operations, the performance of large joins involving this table has significantly deteriorated, with query execution times increasing from minutes to hours. The task is to identify the cause of this performance regression and recommend solutions to restore the original performance levels.

The expert team provided the initial investigation and scheduled the meeting with the client. At the meeting, the expert team discussed the following topics:

  • The client has a billing system with millions of records, and each month they delete extra records.
  • The client has upgraded from Patroni version 11.5-11.18 and after that, their performance decreased whenever the client ran a merge query then it took hours rather than minutes. The client was avoiding it but wanted it to be solved.
  • The client shared the query, DB logs, and everything with the expert via emails for investigation.
  • Solution:

    To improve query performance, experts suggested to the client the following optimizations:

  • Index Optimization: Ensured that tables had appropriate indexes on columns used in filters and joins to speed up data retrieval.
  • Partitioning: Aligned the partitioning scheme with query predicates to minimize partition scans.
  • Limit Result Set: Added a LIMIT clause to reduce the data volume for sorting and joining.
  • Rewrite the Query: Tried using Nested Loop Joins or Hash Joins instead of Merge Joins, depending on data distribution and join conditions.
  • Aggregate Data: Used materialized views or summary tables for pre-aggregated data to reduce processing.
  • Optimize Sort: Increased work_mem for sorting operations and optimized the Sort Key to reduce sorted columns.
  • Analyze Query Plan: Reviewed the query plan to identify and address potential bottlenecks or costly operations.
  • Also, the expert team provided the client with the updated query as an example to improve performance:

    SELECT /*+ ORDERED INDEX (CU, cdr_unbilled_xc_ccnt_n_trns) USE_NL(CD CU) */
        CU.cdr_data_partition_key,
        CU.type_id_usg,
        CU.element_id,
        CU.provider_id,
        CU.point_target,
        CU.point_origin,
        CU.point_id_target,
        CU.point_id_origin,
        CU.trans_dt,
        CU.msg_id,
        CU.msg_id2,
        CU.msg_id_serv,
        CU.rated_units,
        CU.jurisdiction,
        CU.amount,
        CU.profile_id,
        CU.customer_tag,
        CU.rate_period,
        CU.point_tax_code_origin,
        CU.point_tax_code_type_origin,
        CU.point_tax_code_target,
        CU.point_tax_code_type_target,
        CU.base_amt,
        CU.unrounded_amount,
        CU.bill_class,
        CU.zone_class,
        CU.ext_tracking_id,
        CU.annotation,
        CU.rev_rcv_cost_ctr,
        CU.aggr_usage_id,
        CU.rate_currency_code,
        CU.primary_units,
        CU.second_units,
        CU.third_units,
        CU.corridor_plan_id,
        CU.split_row_num,
        CU.open_item_id,
        COALESCE(num_records, 1),
        CU.amount_reduction,
        CU.amount_reduction_id,
        CU.access_region_origin,
        CU.access_region_target,
        CU.subscr_no,
        CU.subscr_no_resets,
        CU.tax_pkg_count,
        CU.geocode,
        CU.aux_tax_info,
        CU.tax_rate_active_dt,
        CU.tax_rate_inactive_dt,
        CU.add_implied_decimal,
        CU.rate_class,
        CU.point_class_origin,
        CU.point_class_target,
        CU.component_id,
        CU.equip_class_code,
        CU.lob_id, --CBS00182254
        CU.bill_order_number
    FROM arbor.CDR_UNBILLED CU
    JOIN arbor.CDR_DATA CD ON CU.account_no = CD.account_no
        AND CU.cdr_data_partition_key = CD.cdr_data_partition_key
        AND CU.msg_id = CD.msg_id
        AND CU.msg_id2 = CD.msg_id2
        AND CU.msg_id_serv = CD.msg_id_serv
        AND CU.split_row_num = CD.split_row_num
    WHERE CU.account_no = 251753520
        AND CU.trans_dt >= trunc(current_date - 20)
        AND CU.trans_dt < trunc(current_date + 20)
        AND CD.cdr_status = 3
        AND CD.no_bill = 0
        AND (1 = 0 OR (1 = 1 AND CU.rate_dt < trunc(current_date + 20)));

    Conclusion:

    The client faced significant performance degradation in their billing system after deleting 52M rows and running VACUUM FULL and REINDEX on a table. Queries that previously took minutes now took hours. Following the upgrade from Patroni 11.5 to 11.18, merge queries experienced increased runtimes. The expert team investigated the issue based on shared queries and DB logs. During the meeting, our experts discussed optimizations like index adjustments, partitioning alignment, limiting result sets, query rewriting, data aggregation, sort optimization, and query plan analysis to restore and enhance performance.