Proactive Insights and Support For Open-Source Applications
  • Applications
  • Platform
  • Support
  • Resources
    • FAQ
    • Newsflash
    • OSSpedia
    • How-to Guides
    • Case Studies
    • Articles
  • Company
    • About Us
    • The OSS in Hossted
  • Contact
Get Started
Book a demo
  • Applications
  • Platform
  • Support
  • Resources
    • FAQ
    • Newsflash
    • OSSpedia
    • How-to Guides
    • Case Studies
    • Articles
  • Company
    • About Us
    • The OSS in Hossted
  • Contact
  • Home
  • Knowledge Base
  • Case Studies
  • Data Management and Analytics
  • Data Analytics

Data Analytics

All OSSpediaArticlesHow ToNewsflashCase Studies
Don't Miss out!
Join our newsletter for exclusive updates on open source innovations.

    Selected category
    • Communication and Collaboration
      • Communication
    • Project and Agile Management
      • IT Business Management
    • Data Management and Analytics
      • Data Analytics
      • Database
    • DevOps
      • Data Management and Analytics
      • Developer Tools
      • Application Development
    • Infrastructure and Network
      • Storage
      • Security
    9 Apr 2025 Recurring Kafka Connector Failures: Diagnosing and Preventing Message Corruption

    Problem: The client faced recurring Kafka sink connector failures (e.g., chf-cdr-sftp-sink-connector) in a Kubernetes environment (Kafka 3.2.0 with three brokers and ZooKeeper). The failures were caused by corrupt messages at specific offsets, leading to task crashes. Despite skipping corrupt offsets and restarting connectors, the issue persisted, requiring a more permanent solution. Process: Step 1: Environment […]

    Data Analytics
    19 Jan 2025 Apache Spark: Resolving Airflow Scheduler Heartbeat Issues in Production Environment

    Problem: The client reported continuous heartbeat issues in the Airflow scheduler, causing failure to generate controller DAGs in a production environment. This critical issue impacted job execution, especially when multiple jobs were triggered simultaneously, leading to timeouts and job failures. Process: Step 1: Initial Identification The error message displayed in the logs indicated that the […]

    Data Analytics
    17 Jan 2025 Resolving Special Character Search Issues in Elasticsearch

    Problem: The client encountered an issue in their Elasticsearch setup where search results did not return exact matches when the search phrase included special characters, such as “:” (colon). This problem persisted despite using a custom indexing configuration with the `index_word_delimiter_graph_filter`. The client needed a solution to preserve special characters for exact matches while maintaining […]

    Data Analytics
    3 Jan 2025 Resolving Indexing Failures in OpenSearch During High Availability Testing

    Problem: The client implemented a 4-node OpenSearch cluster to ensure high availability for their application. When all four nodes were operational, both indexing and searching worked seamlessly. However, during a high availability test where two nodes were intentionally turned off, the indexing process stalled, and no documents were processed. Indexing resumed only after the two […]

    Data Analytics
    20 Dec 2024 Resolving Airflow DAG Triggering Issues

    Problem: The client’s operations team reported issues with triggering jobs via Apache Airflow, specifically through a custom solution, the dag_factory. While jobs triggered outside of the dag_factory worked without problems, those initiated through it were not being processed as expected. Attempts to gather logs in the Airflow UI yielded no entries, as the DAG triggering […]

    Data Analytics
    13 Dec 2024 Resolving Datastore Configuration Issues in CKAN for PostgreSQL Integration

    Problem: The client encountered issues with the data-explorer view functionality in their CKAN environment. While resources could be downloaded manually, the data-explorer view was unable to load. During the initial investigation, it was found that while the “datastore” plugin was enabled in the ckan.ini file, the ckan.datastore.write_url and ckan.datastore.read_url were not configured. The client was […]

    Data Analytics
    22 Nov 2024 Managing Out-of-Memory (OOM) Errors and Optimizing Shard Configuration in OpenSearch Production Environment

    Problem: In the production environment of a multi-node OpenSearch cluster, the nodes frequently crashed due to Out-of-Memory (OOM) errors. Initially, the heap size was increased from 16 GB to 30 GB based on IBM’s recommendations, but the problem persisted. IBM further suggested increasing the number of shards from 16 to 64 to mitigate memory overload. […]

    Data Analytics
    15 Nov 2024 Resolving HBase Region Transition and Hadoop File System Permission Issues in a PROD Environment

    Problem: The client encountered a critical issue in their production environment involving HBase regions stuck in a transition state. This problem resulted in service disruptions within their Hadoop cluster. The issue was exacerbated by file system permission changes following a cold restart of the cluster, leading to difficulties in accessing data and managing HBase operations. […]

    Data Analytics
    8 Nov 2024 Upgrade of Elasticsearch from Version 7.15 to 7.17

    Problem: The client requested assistance with upgrading their Elasticsearch installation from version 7.15 to 7.17. The client sought a detailed step-by-step guide and expressed the need for a meeting to clarify the upgrade process. Process: Upon receiving the request, the expert requested additional details about the client’s current Elasticsearch setup, including information on the cluster, […]

    Data Analytics
    Proactive Insights and Support For Open-Source Applications
    Contact us: Whatsapp
    Company
    • About Hossted
    • Data Processing Addendum
    Solutions
    • Applications
    • Support Plans
    • About Solution
    Resources
    • FAQ
    • Knowledge Base
    © HOSSTED 2025 All rights reserved
    • Privacy Policy
    • Terms and Conditions
    • Cookies Policy