Kernel Null Pointer Dereference in CAS Cache with RAID10 Devices

# Kernel Null Pointer Dereference in CAS Cache with RAID10 Devices

## Bug Summary
CAS Cache version 25.03.0.0963.release causes a **kernel null pointer dereference** crash when attempting to create a cache instance using RAID10 devices. The crash occurs consistently during the discard operation phase of cache initialization.

## Environment
- **Operating System**: Ubuntu 22.04.5 LTS
- **Linux Kernel**: 5.15.0-144-generic
- **CAS Cache Version**: 25.03.0.0963.release
- **Device Type**: MD RAID10 array (4x NVMe devices)
- **RAID Configuration**: 
  - Level: RAID10 
  - Layout: near=2
  - Chunk Size: 512K
  - Devices: 4x NVMe drives

## Steps to Reproduce
1. Create a RAID10 array with 4 NVMe devices:
   ```bash
   mdadm --create /dev/md1 --level=10 --raid-devices=4 /dev/nvme0n1 /dev/nvme1n1 /dev/nvme2n1 /dev/nvme3n1
   ```

2. Clean the device:
   ```bash
   wipefs -a /dev/md1
   dd if=/dev/zero of=/dev/md1 bs=1M count=10
   ```

3. Attempt to create CAS cache instance:
   ```bash
   casadm -S -i 1 -d /dev/disk/by-id/md-uuid-[uuid]
   ```

## Expected Behavior
CAS cache instance should be created successfully without system crashes.

## Actual Behavior
- The `casadm` command hangs indefinitely (tested for 5+ minutes)
- **Kernel crashes with null pointer dereference**
- System becomes unstable

## Crash Details

### Kernel Panic Stack Trace
```
[ 4307.057075] CR2: 0000000000000000
[ 4307.057077] ---[ end trace e4a25646554913d5 ]---
[ 4307.118416] RIP: 0010:0x0
[ 4307.118421] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
[ 4307.118423] RSP: 0018:ffffa451cd92b948 EFLAGS: 00010206
[ 4307.118425] RAX: 0000000000000000 RBX: 0000000000092800 RCX: 0000000000000001
```

### Call Stack Analysis
The crash occurs in the CAS Cache discard operation chain:
```
block_dev_forward_discard+0x184/0x290 [cas_cache]
ocf_volume_forward_discard+0x4d/0x80 [cas_cache]
ocf_req_forward_cache_discard+0x39/0x50 [cas_cache]
ocf_submit_cache_discard+0xa0/0x130 [cas_cache]
_ocf_mngt_attach_discard+0x7b/0xf0 [cas_cache]
_ocf_pipeline_run_step+0xeb/0x170 [cas_cache]
ocf_queue_run+0xf3/0x110 [cas_cache]
_cas_io_queue_thread+0x6f/0x110 [cas_cache]
```

### Root Cause Analysis
- **Error Type**: Null pointer dereference (`CR2: 0000000000000000`)
- **Location**: Function pointer call to address 0x0 (`RIP: 0010:0x0`)
- **Module**: CAS Cache discard handling code
- **Trigger**: RAID10 device discard operations during cache initialization

## Technical Details

### RAID10 Device Information
```
md1 : active raid10 nvme3n1[3] nvme2n1[2] nvme0n1[0] nvme1n1[1]
      4000532480 blocks super 1.2 512K chunks 2 near-copies [4/4] [UUUU]
      bitmap: 0/30 pages [0KB], 65536KB chunk
```

### Process State
```bash
$ ps aux | grep casadm
root  55713  0.0  0.0  12064  2128 pts/1  Sl+  14:57  0:00 casadm -S -i 1 -d /dev/disk/by-id/md-uuid-[uuid]
```

## Reproduction Rate
- **100% reproducible** across multiple attempts
- **Multiple systems affected** (tested on storage01, storage03)
- **Consistent crash location** in discard operation chain

## Workarounds Attempted
1. **Using `--force` flag**: Still crashes
2. **Using `--cache-mode wt`**: Still crashes  
3. **Using `--no-flush`**: Still crashes
4. **Different by-id paths**: Still crashes

## Impact Assessment
- **Severity**: Critical - Kernel crash/system instability
- **Scope**: RAID10 devices with CAS Cache 25.03.0.0963.release
- **Data Safety**: No data corruption observed, but system requires reboot

## Suggested Investigation Areas
1. **Null function pointer** in CAS Cache discard handling code
2. **RAID10-specific discard operation** compatibility 
3. **Memory management** in `block_dev_forward_discard` function
4. **Race condition** during cache initialization with RAID10 devices

## Additional Notes
- Regular block devices (non-RAID) may not be affected
- This appears to be a regression or compatibility issue specific to RAID10
- The crash occurs during cache initialization, not during normal I/O operations

## Request
Please investigate this critical kernel crash bug. The null pointer dereference in the discard handling path makes CAS Cache unusable with RAID10 devices in the current release.

**Would you like crash dumps or additional debugging information?**

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Kernel Null Pointer Dereference in CAS Cache with RAID10 Devices #1671

Kernel Null Pointer Dereference in CAS Cache with RAID10 Devices

Bug Summary

Environment

Steps to Reproduce

Expected Behavior

Actual Behavior

Crash Details

Kernel Panic Stack Trace

Call Stack Analysis

Root Cause Analysis

Technical Details

RAID10 Device Information

Process State

Reproduction Rate

Workarounds Attempted

Impact Assessment

Suggested Investigation Areas

Additional Notes

Request

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Kernel Null Pointer Dereference in CAS Cache with RAID10 Devices #1671

Description

Kernel Null Pointer Dereference in CAS Cache with RAID10 Devices

Bug Summary

Environment

Steps to Reproduce

Expected Behavior

Actual Behavior

Crash Details

Kernel Panic Stack Trace

Call Stack Analysis

Root Cause Analysis

Technical Details

RAID10 Device Information

Process State

Reproduction Rate

Workarounds Attempted

Impact Assessment

Suggested Investigation Areas

Additional Notes

Request

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions