mirror of
https://github.com/mohitmishra786/amILearningEnough.git
synced 2025-12-17 20:34:40 +03:00
379 lines
6.7 KiB
Markdown
379 lines
6.7 KiB
Markdown
# BPFtrace and eBPF Tools Guide
|
|
|
|
## Table of Contents
|
|
1. [Introduction](#introduction)
|
|
2. [System Layer Overview](#system-layer-overview)
|
|
3. [Tool Categories](#tool-categories)
|
|
4. [Detailed Tool Analysis](#detailed-tool-analysis)
|
|
5. [Command Reference](#command-reference)
|
|
|
|
## Introduction
|
|
|
|
This guide covers the comprehensive set of bpftrace and eBPF tools available for Linux system analysis and performance monitoring across different layers of the system stack.
|
|
|
|
## System Layer Overview
|
|
|
|
The tools are organized across these main layers:
|
|
1. Applications & Runtimes
|
|
2. System Libraries
|
|
3. System Call Interface
|
|
4. Kernel Subsystems:
|
|
- VFS (Virtual File System)
|
|
- Network Stack (Sockets, TCP/UDP, IP)
|
|
- Scheduler
|
|
- Virtual Memory
|
|
- Device Drivers
|
|
|
|
## Tool Categories
|
|
|
|
### Application Level Tools
|
|
| Tool | Purpose | Layer |
|
|
|------|---------|-------|
|
|
| opensnoop | Trace file opens | Application |
|
|
| statsnoop | Trace stat() syscalls | Application |
|
|
| syncsnoop | Trace sync operations | Application |
|
|
| bashreadline | Trace bash commands | Application |
|
|
| gethostlatency | DNS latency analysis | System Libraries |
|
|
|
|
### System Call Interface Tools
|
|
| Tool | Purpose | Layer |
|
|
|------|---------|-------|
|
|
| syscount | Count syscalls | System Call |
|
|
| execsnoop | Trace new processes | System Call |
|
|
| killsnoop | Trace kill() syscalls | System Call |
|
|
| pidpersec | New processes per second | System Call |
|
|
|
|
### File System Tools
|
|
| Tool | Purpose | Layer |
|
|
|------|---------|-------|
|
|
| vfscount | VFS operation counts | VFS |
|
|
| vfsstat | VFS operation stats | VFS |
|
|
| writeback | Trace file writeback | File Systems |
|
|
| xfsdist | XFS operation latency | File Systems |
|
|
| mdflush | Trace md RAID flush events | Volume Manager |
|
|
|
|
### Block Device Tools
|
|
| Tool | Purpose | Layer |
|
|
|------|---------|-------|
|
|
| biosnoop | Trace block I/O | Block Device |
|
|
| biolatency | Block I/O latency | Block Device |
|
|
| bitesize | Block I/O size analysis | Block Device |
|
|
|
|
### Network Tools
|
|
| Tool | Purpose | Layer |
|
|
|------|---------|-------|
|
|
| tcpconnect | Trace TCP connections | TCP/UDP |
|
|
| tcpaccept | Trace TCP accepts | TCP/UDP |
|
|
| tcpretrans | Trace TCP retransmits | TCP/UDP |
|
|
| tcpdrop | Trace TCP drops | TCP/UDP |
|
|
|
|
### CPU/Scheduler Tools
|
|
| Tool | Purpose | Layer |
|
|
|------|---------|-------|
|
|
| cpuwalk | CPU instruction analysis | Scheduler |
|
|
| runqlat | Run queue latency | Scheduler |
|
|
| runqlen | Run queue length | Scheduler |
|
|
| offcputime | Off-CPU analysis | Scheduler |
|
|
|
|
### Memory Management Tools
|
|
| Tool | Purpose | Layer |
|
|
|------|---------|-------|
|
|
| oomkill | Trace OOM killer | Virtual Memory |
|
|
| capable | Trace capability checks | System |
|
|
|
|
## Detailed Tool Analysis
|
|
|
|
### Application Monitoring Tools
|
|
|
|
#### opensnoop
|
|
```bash
|
|
# Trace all file opens
|
|
opensnoop
|
|
|
|
# Trace specific process
|
|
opensnoop -p 1234
|
|
|
|
# Include stack traces
|
|
opensnoop --stack
|
|
|
|
# Filter by file name
|
|
opensnoop -n "*.txt"
|
|
```
|
|
|
|
#### statsnoop
|
|
```bash
|
|
# Trace all stat() calls
|
|
statsnoop
|
|
|
|
# Show failed stats only
|
|
statsnoop -x
|
|
|
|
# Filter by process name
|
|
statsnoop -n "nginx"
|
|
|
|
# Include extended details
|
|
statsnoop -v
|
|
```
|
|
|
|
#### bashreadline
|
|
```bash
|
|
# Trace all bash commands
|
|
bashreadline
|
|
|
|
# Include timestamps
|
|
bashreadline -t
|
|
|
|
# Trace specific shell PID
|
|
bashreadline -p 1234
|
|
```
|
|
|
|
### Network Analysis Tools
|
|
|
|
#### tcpconnect
|
|
```bash
|
|
# Trace all TCP connections
|
|
tcpconnect
|
|
|
|
# Show port numbers
|
|
tcpconnect -p
|
|
|
|
# Include timestamps
|
|
tcpconnect -t
|
|
|
|
# Filter by port
|
|
tcpconnect -P 80
|
|
```
|
|
|
|
#### tcpretrans
|
|
```bash
|
|
# Trace TCP retransmissions
|
|
tcpretrans
|
|
|
|
# Include TCP state
|
|
tcpretrans -s
|
|
|
|
# Show stack traces
|
|
tcpretrans --stack
|
|
|
|
# Filter by IP
|
|
tcpretrans -i 192.168.1.1
|
|
```
|
|
|
|
### File System Analysis
|
|
|
|
#### vfscount
|
|
```bash
|
|
# Count VFS operations
|
|
vfscount
|
|
|
|
# Group by operation type
|
|
vfscount -g
|
|
|
|
# Include stack traces
|
|
vfscount --stack
|
|
```
|
|
|
|
#### writeback
|
|
```bash
|
|
# Trace file writeback
|
|
writeback
|
|
|
|
# Show per-device stats
|
|
writeback -d
|
|
|
|
# Include process info
|
|
writeback -p
|
|
```
|
|
|
|
### Block Device Analysis
|
|
|
|
#### biosnoop
|
|
```bash
|
|
# Trace block I/O
|
|
biosnoop
|
|
|
|
# Show queued time
|
|
biosnoop -q
|
|
|
|
# Filter by device
|
|
biosnoop -d sda
|
|
|
|
# Include process info
|
|
biosnoop -p
|
|
```
|
|
|
|
#### biolatency
|
|
```bash
|
|
# Show block I/O latency
|
|
biolatency
|
|
|
|
# Use microsecond units
|
|
biolatency -u
|
|
|
|
# Create histogram
|
|
biolatency -h
|
|
|
|
# Filter by device
|
|
biolatency -d sda
|
|
```
|
|
|
|
### CPU and Scheduler Analysis
|
|
|
|
#### runqlat
|
|
```bash
|
|
# Show run queue latency
|
|
runqlat
|
|
|
|
# Use microsecond units
|
|
runqlat -u
|
|
|
|
# Filter by CPU
|
|
runqlat -c 0
|
|
|
|
# Create histogram
|
|
runqlat --hist
|
|
```
|
|
|
|
#### offcputime
|
|
```bash
|
|
# Trace off-CPU time
|
|
offcputime
|
|
|
|
# Filter by process
|
|
offcputime -p 1234
|
|
|
|
# Set duration
|
|
offcputime -d 10
|
|
|
|
# Include user stacks
|
|
offcputime -u
|
|
```
|
|
|
|
## Command Reference
|
|
|
|
### General Options
|
|
Most bpftrace tools support these common options:
|
|
```bash
|
|
-h # Show help message
|
|
-v # Verbose output
|
|
-d # Debug output
|
|
-p PID # Filter by process ID
|
|
-t # Include timestamps
|
|
--stack # Show stack traces
|
|
```
|
|
|
|
### Advanced Usage
|
|
|
|
#### Custom Scripts
|
|
```bash
|
|
# Create custom bpftrace script
|
|
cat > custom.bt << 'EOF'
|
|
#!/usr/bin/bpftrace
|
|
tracepoint:syscalls:sys_enter_open
|
|
{
|
|
printf("%s opened %s\n", comm, str(args->filename));
|
|
}
|
|
EOF
|
|
|
|
# Run custom script
|
|
bpftrace custom.bt
|
|
```
|
|
|
|
#### Performance Monitoring
|
|
```bash
|
|
# Monitor system calls
|
|
syscount -i 1
|
|
|
|
# Monitor process creation
|
|
pidpersec -i 5
|
|
|
|
# Track OOM kills
|
|
oomkill -t
|
|
```
|
|
|
|
### Best Practices
|
|
|
|
1. **Resource Usage**
|
|
- Be cautious with stack traces in production
|
|
- Use sampling for high-frequency events
|
|
- Monitor overhead with top/htop
|
|
|
|
2. **Filtering**
|
|
- Use specific filters to reduce overhead
|
|
- Combine multiple conditions when possible
|
|
- Consider using time-based filters
|
|
|
|
3. **Output Control**
|
|
- Use appropriate output formats
|
|
- Consider logging to files for analysis
|
|
- Use aggregation for high-volume data
|
|
|
|
4. **Troubleshooting**
|
|
- Start with broad tools
|
|
- Narrow down to specific events
|
|
- Use multiple tools for correlation
|
|
|
|
## Performance Considerations
|
|
|
|
### Overhead Management
|
|
```bash
|
|
# Reduce overhead with sampling
|
|
biolatency --sample-rate 10
|
|
|
|
# Use efficient filters
|
|
opensnoop -n '*.log'
|
|
|
|
# Limit stack traces
|
|
tcpconnect --stack --stack-storage-size 1024
|
|
```
|
|
|
|
### Production Usage
|
|
1. Test tools in development first
|
|
2. Use appropriate filtering
|
|
3. Monitor system impact
|
|
4. Set appropriate buffer sizes
|
|
5. Use time-based execution limits
|
|
|
|
## Common Use Cases
|
|
|
|
### Performance Analysis
|
|
```bash
|
|
# Analyze disk I/O
|
|
biolatency -h
|
|
biosnoop -p
|
|
|
|
# Network performance
|
|
tcpretrans -s
|
|
tcpconnect -t
|
|
|
|
# CPU scheduling
|
|
runqlat --hist
|
|
offcputime -p 1234
|
|
```
|
|
|
|
### Troubleshooting
|
|
```bash
|
|
# File system issues
|
|
opensnoop -t
|
|
vfscount
|
|
|
|
# Network problems
|
|
tcpdrop
|
|
tcpretrans
|
|
|
|
# Memory issues
|
|
oomkill -t
|
|
```
|
|
|
|
### Security Monitoring
|
|
```bash
|
|
# Track capability checks
|
|
capable -v
|
|
|
|
# Monitor process creation
|
|
execsnoop -t
|
|
|
|
# Track file access
|
|
opensnoop -t
|
|
```
|