Warum Proxy-Leistung überwachen
Proxy-Infrastruktur fehlschlägt. Ihr Abstreifer kann stundenlang mit einer 40%igen Erfolgsquote laufen, bevor jemand bemerkt. Reaktionszeiten kriechen sich, die Blockraten erhöhen sich und die Datenqualität degradiert – alles ohne offensichtliche Fehler auszulösen. Die Überwachung verwandelt diese unsichtbaren Probleme in handlungsfähige Alarme.
Diese Anleitung zeigt Ihnen, wie Sie Ihre Proxy-Anfragen zu instrumentieren, sinnvolle Metriken zu sammeln, Dashboards zu erstellen und Alarmierung, die Degradation einholt, bevor es Ihre Datenpipeline beeinflusst. Alle Beispiele verwenden ProxyHat's Wohn-Proxies und sind produktionsbereit.
Wenn Sie Ihre Proxy-Performance nicht messen, raten Sie. Bei der Bewertung von Skalierung Kosten Geld und produziert unzuverlässige Daten.
Key Metrics zu verfolgen
| Metric | Was es dir sagt | Alarmstufe |
|---|---|---|
| Erfolgsquote | Prozentsatz der Anträge, die 2xx Status zurückgeben | Unter 90% |
| Antwortquote (p50/p95/p99) | Wie schnell gefragte Anfragen vollständig sind | p95 über 10 |
| Fehlerquote nach Typ | Welche Fehler dominieren (Timeout, 403, 429, Verbindung) | Jeder einzelne Typ über 15% |
| Anfragen pro Sekunde | Durchsatz Ihrer Abstreifpipeline | Unerwartete Basislinie |
| Bandbreite Verwendung | Über Proxy übertragene Daten | Genehmigung der Plangrenze |
| Blockrate nach Ziel | Welche Ziele blockieren Sie am meisten | Über 20% für jedes Ziel |
| Retry Rate | Wie viele Anfragen benötigen Retries | Über 30% |
| Session Reuse Effizienz | Wie lange klebrige Sitzungen überleben | Unter 5 Anträgen Durchschnitt |
Python: Instrumentierter Proxy Client
Dieser Client wickelt jede Anforderung mit Timing, Statusverfolgung und strukturiertem Logging.
import time
import uuid
import logging
import statistics
from dataclasses import dataclass, field
from collections import defaultdict
from typing import Optional
import requests
logger = logging.getLogger("proxy_monitor")
@dataclass
class ProxyMetrics:
"""Collects and aggregates proxy performance metrics."""
total_requests: int = 0
successful: int = 0
failed: int = 0
retries: int = 0
latencies: list = field(default_factory=list)
status_codes: dict = field(default_factory=lambda: defaultdict(int))
errors_by_type: dict = field(default_factory=lambda: defaultdict(int))
bytes_transferred: int = 0
requests_by_target: dict = field(default_factory=lambda: defaultdict(lambda: {"success": 0, "failed": 0}))
@property
def success_rate(self) -> float:
return (self.successful / self.total_requests * 100) if self.total_requests > 0 else 0.0
@property
def p50_latency(self) -> float:
return statistics.median(self.latencies) if self.latencies else 0.0
@property
def p95_latency(self) -> float:
if not self.latencies:
return 0.0
sorted_lat = sorted(self.latencies)
idx = int(len(sorted_lat) * 0.95)
return sorted_lat[min(idx, len(sorted_lat) - 1)]
@property
def p99_latency(self) -> float:
if not self.latencies:
return 0.0
sorted_lat = sorted(self.latencies)
idx = int(len(sorted_lat) * 0.99)
return sorted_lat[min(idx, len(sorted_lat) - 1)]
def summary(self) -> dict:
return {
"total_requests": self.total_requests,
"success_rate": f"{self.success_rate:.1f}%",
"p50_latency": f"{self.p50_latency:.3f}s",
"p95_latency": f"{self.p95_latency:.3f}s",
"p99_latency": f"{self.p99_latency:.3f}s",
"retries": self.retries,
"bytes_transferred": self.bytes_transferred,
"top_errors": dict(sorted(
self.errors_by_type.items(),
key=lambda x: x[1], reverse=True
)[:5]),
"status_distribution": dict(self.status_codes),
}
class MonitoredProxyClient:
"""HTTP client with built-in proxy monitoring."""
def __init__(self, max_retries: int = 3):
self.metrics = ProxyMetrics()
self.max_retries = max_retries
self._alert_callbacks = []
def on_alert(self, callback):
"""Register a callback for metric alerts."""
self._alert_callbacks.append(callback)
def _check_alerts(self):
if self.metrics.total_requests < 10:
return
alerts = []
if self.metrics.success_rate < 90:
alerts.append(f"Low success rate: {self.metrics.success_rate:.1f}%")
if self.metrics.p95_latency > 10:
alerts.append(f"High p95 latency: {self.metrics.p95_latency:.1f}s")
if self.metrics.retries / max(self.metrics.total_requests, 1) > 0.3:
alerts.append(f"High retry rate: {self.metrics.retries}/{self.metrics.total_requests}")
for alert in alerts:
logger.warning(f"ALERT: {alert}")
for cb in self._alert_callbacks:
cb(alert)
def fetch(self, url: str, country: Optional[str] = None) -> Optional[requests.Response]:
from urllib.parse import urlparse
target_domain = urlparse(url).netloc
for attempt in range(self.max_retries + 1):
session_id = uuid.uuid4().hex[:8]
username = f"USERNAME-session-{session_id}"
if country:
username += f"-country-{country}"
proxy = f"http://{username}:PASSWORD@gate.proxyhat.com:8080"
self.metrics.total_requests += 1
if attempt > 0:
self.metrics.retries += 1
start = time.time()
try:
response = requests.get(
url,
proxies={"http": proxy, "https": proxy},
timeout=30,
)
latency = time.time() - start
self.metrics.latencies.append(latency)
self.metrics.status_codes[response.status_code] += 1
if response.status_code >= 400:
self.metrics.errors_by_type[f"HTTP_{response.status_code}"] += 1
self.metrics.requests_by_target[target_domain]["failed"] += 1
if response.status_code in (403, 429, 503) and attempt < self.max_retries:
time.sleep(2 ** attempt)
continue
self.metrics.failed += 1
else:
self.metrics.successful += 1
self.metrics.bytes_transferred += len(response.content)
self.metrics.requests_by_target[target_domain]["success"] += 1
self._check_alerts()
return response
except requests.exceptions.Timeout:
self.metrics.errors_by_type["timeout"] += 1
self.metrics.latencies.append(time.time() - start)
self.metrics.requests_by_target[target_domain]["failed"] += 1
except requests.exceptions.ConnectionError:
self.metrics.errors_by_type["connection_error"] += 1
self.metrics.latencies.append(time.time() - start)
self.metrics.requests_by_target[target_domain]["failed"] += 1
except Exception as e:
self.metrics.errors_by_type[type(e).__name__] += 1
self.metrics.latencies.append(time.time() - start)
if attempt < self.max_retries:
time.sleep(2 ** attempt)
self.metrics.failed += 1
self._check_alerts()
return None
# Usage
client = MonitoredProxyClient(max_retries=3)
client.on_alert(lambda msg: print(f"[ALERT] {msg}"))
urls = [f"https://example.com/product/{i}" for i in range(100)]
for url in urls:
response = client.fetch(url)
print(client.metrics.summary())
Node.js: Instrumentierter Proxy Client
const crypto = require('crypto');
const { HttpsProxyAgent } = require('https-proxy-agent');
const { EventEmitter } = require('events');
class ProxyMetrics {
constructor() {
this.totalRequests = 0;
this.successful = 0;
this.failed = 0;
this.retries = 0;
this.latencies = [];
this.statusCodes = {};
this.errorsByType = {};
this.bytesTransferred = 0;
this.requestsByTarget = {};
}
get successRate() {
return this.totalRequests > 0
? ((this.successful / this.totalRequests) * 100).toFixed(1)
: '0.0';
}
percentile(p) {
if (this.latencies.length === 0) return 0;
const sorted = [...this.latencies].sort((a, b) => a - b);
const idx = Math.min(
Math.floor(sorted.length * (p / 100)),
sorted.length - 1
);
return sorted[idx];
}
summary() {
return {
totalRequests: this.totalRequests,
successRate: `${this.successRate}%`,
p50Latency: `${this.percentile(50).toFixed(3)}s`,
p95Latency: `${this.percentile(95).toFixed(3)}s`,
p99Latency: `${this.percentile(99).toFixed(3)}s`,
retries: this.retries,
bytesTransferred: this.bytesTransferred,
statusDistribution: { ...this.statusCodes },
topErrors: Object.entries(this.errorsByType)
.sort(([, a], [, b]) => b - a)
.slice(0, 5)
.reduce((obj, [k, v]) => ({ ...obj, [k]: v }), {}),
};
}
}
class MonitoredProxyClient extends EventEmitter {
constructor({ maxRetries = 3 } = {}) {
super();
this.metrics = new ProxyMetrics();
this.maxRetries = maxRetries;
}
_checkAlerts() {
if (this.metrics.totalRequests < 10) return;
if (parseFloat(this.metrics.successRate) < 90) {
this.emit('alert', `Low success rate: ${this.metrics.successRate}%`);
}
if (this.metrics.percentile(95) > 10) {
this.emit('alert', `High p95 latency: ${this.metrics.percentile(95).toFixed(1)}s`);
}
}
async fetch(url, { country } = {}) {
const targetDomain = new URL(url).hostname;
for (let attempt = 0; attempt <= this.maxRetries; attempt++) {
const sessionId = crypto.randomBytes(4).toString('hex');
let username = `USERNAME-session-${sessionId}`;
if (country) username += `-country-${country}`;
const agent = new HttpsProxyAgent(
`http://${username}:PASSWORD@gate.proxyhat.com:8080`
);
this.metrics.totalRequests++;
if (attempt > 0) this.metrics.retries++;
const startTime = Date.now();
try {
const response = await fetch(url, {
agent,
signal: AbortSignal.timeout(30000),
});
const latency = (Date.now() - startTime) / 1000;
this.metrics.latencies.push(latency);
this.metrics.statusCodes[response.status] =
(this.metrics.statusCodes[response.status] || 0) + 1;
if (response.status >= 400) {
this.metrics.errorsByType[`HTTP_${response.status}`] =
(this.metrics.errorsByType[`HTTP_${response.status}`] || 0) + 1;
if ([403, 429, 503].includes(response.status) && attempt < this.maxRetries) {
await new Promise(r => setTimeout(r, 1000 * Math.pow(2, attempt)));
continue;
}
this.metrics.failed++;
} else {
this.metrics.successful++;
const body = await response.text();
this.metrics.bytesTransferred += body.length;
}
this._checkAlerts();
return response;
} catch (err) {
const latency = (Date.now() - startTime) / 1000;
this.metrics.latencies.push(latency);
this.metrics.errorsByType[err.name] =
(this.metrics.errorsByType[err.name] || 0) + 1;
if (attempt < this.maxRetries) {
await new Promise(r => setTimeout(r, 1000 * Math.pow(2, attempt)));
continue;
}
this.metrics.failed++;
}
}
this._checkAlerts();
return null;
}
}
// Usage
const client = new MonitoredProxyClient({ maxRetries: 3 });
client.on('alert', msg => console.warn(`[ALERT] ${msg}`));
const urls = Array.from({ length: 100 }, (_, i) =>
`https://example.com/product/${i + 1}`
);
for (const url of urls) {
await client.fetch(url);
}
console.log(client.metrics.summary());
Gehen Sie: Instrumentiert Proxy Client
package main
import (
"crypto/rand"
"encoding/hex"
"fmt"
"io"
"math"
"net/http"
"net/url"
"sort"
"sync"
"time"
)
type Metrics struct {
mu sync.Mutex
TotalRequests int
Successful int
Failed int
Retries int
Latencies []float64
StatusCodes map[int]int
ErrorsByType map[string]int
BytesTransferred int64
}
func NewMetrics() *Metrics {
return &Metrics{
StatusCodes: make(map[int]int),
ErrorsByType: make(map[string]int),
}
}
func (m *Metrics) RecordSuccess(latency float64, status int, bytes int) {
m.mu.Lock()
defer m.mu.Unlock()
m.TotalRequests++
m.Successful++
m.Latencies = append(m.Latencies, latency)
m.StatusCodes[status]++
m.BytesTransferred += int64(bytes)
}
func (m *Metrics) RecordFailure(latency float64, errType string) {
m.mu.Lock()
defer m.mu.Unlock()
m.TotalRequests++
m.Failed++
m.Latencies = append(m.Latencies, latency)
m.ErrorsByType[errType]++
}
func (m *Metrics) Percentile(p float64) float64 {
m.mu.Lock()
defer m.mu.Unlock()
if len(m.Latencies) == 0 {
return 0
}
sorted := make([]float64, len(m.Latencies))
copy(sorted, m.Latencies)
sort.Float64s(sorted)
idx := int(math.Min(float64(len(sorted)-1), float64(len(sorted))*p/100))
return sorted[idx]
}
func (m *Metrics) SuccessRate() float64 {
m.mu.Lock()
defer m.mu.Unlock()
if m.TotalRequests == 0 {
return 0
}
return float64(m.Successful) / float64(m.TotalRequests) * 100
}
func (m *Metrics) Summary() string {
return fmt.Sprintf(
"Requests: %d | Success: %.1f%% | p50: %.3fs | p95: %.3fs | p99: %.3fs | Retries: %d",
m.TotalRequests, m.SuccessRate(),
m.Percentile(50), m.Percentile(95), m.Percentile(99),
m.Retries,
)
}
type MonitoredClient struct {
metrics *Metrics
maxRetries int
}
func NewMonitoredClient(maxRetries int) *MonitoredClient {
return &MonitoredClient{
metrics: NewMetrics(),
maxRetries: maxRetries,
}
}
func (c *MonitoredClient) Fetch(target string) (*http.Response, error) {
for attempt := 0; attempt <= c.maxRetries; attempt++ {
b := make([]byte, 4)
rand.Read(b)
sessionID := hex.EncodeToString(b)
proxyStr := fmt.Sprintf(
"http://USERNAME-session-%s:PASSWORD@gate.proxyhat.com:8080",
sessionID,
)
proxyURL, _ := url.Parse(proxyStr)
client := &http.Client{
Transport: &http.Transport{Proxy: http.ProxyURL(proxyURL)},
Timeout: 30 * time.Second,
}
if attempt > 0 {
c.metrics.mu.Lock()
c.metrics.Retries++
c.metrics.mu.Unlock()
}
start := time.Now()
resp, err := client.Get(target)
latency := time.Since(start).Seconds()
if err != nil {
c.metrics.RecordFailure(latency, "connection_error")
if attempt < c.maxRetries {
time.Sleep(time.Duration(math.Pow(2, float64(attempt))) * time.Second)
continue
}
return nil, err
}
body, _ := io.ReadAll(resp.Body)
resp.Body.Close()
if resp.StatusCode >= 400 {
c.metrics.RecordFailure(latency, fmt.Sprintf("HTTP_%d", resp.StatusCode))
if attempt < c.maxRetries {
time.Sleep(time.Duration(math.Pow(2, float64(attempt))) * time.Second)
continue
}
} else {
c.metrics.RecordSuccess(latency, resp.StatusCode, len(body))
}
return resp, nil
}
return nil, fmt.Errorf("all retries exhausted for %s", target)
}
func main() {
client := NewMonitoredClient(3)
for i := 0; i < 50; i++ {
url := fmt.Sprintf("https://example.com/product/%d", i+1)
client.Fetch(url)
}
fmt.Println(client.metrics.Summary())
}
Strukturierte Logging für Proxy-Anfragen
JSON-strukturierte Protokolle erleichtern die Aggregation und Analyse der Proxyleistung über verteilte Schaber.
import json
import logging
import time
import uuid
import requests
class JSONProxyLogger:
"""Logs every proxy request as structured JSON."""
def __init__(self, log_file: str = "proxy_requests.jsonl"):
self.logger = logging.getLogger("proxy_json")
handler = logging.FileHandler(log_file)
handler.setFormatter(logging.Formatter("%(message)s"))
self.logger.addHandler(handler)
self.logger.setLevel(logging.INFO)
def log_request(self, entry: dict):
self.logger.info(json.dumps(entry))
def fetch(self, url: str, country: str = None) -> requests.Response:
session_id = uuid.uuid4().hex[:8]
username = f"USERNAME-session-{session_id}"
if country:
username += f"-country-{country}"
proxy = f"http://{username}:PASSWORD@gate.proxyhat.com:8080"
start = time.time()
try:
response = requests.get(
url,
proxies={"http": proxy, "https": proxy},
timeout=30,
)
latency = time.time() - start
self.log_request({
"timestamp": time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime()),
"url": url,
"status": response.status_code,
"latency_ms": round(latency * 1000),
"bytes": len(response.content),
"session_id": session_id,
"country": country,
"success": response.status_code < 400,
})
return response
except Exception as e:
latency = time.time() - start
self.log_request({
"timestamp": time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime()),
"url": url,
"error": str(e),
"error_type": type(e).__name__,
"latency_ms": round(latency * 1000),
"session_id": session_id,
"country": country,
"success": False,
})
raise
# Usage — logs produce JSONL like:
# {"timestamp":"2026-02-26T10:30:00Z","url":"https://...","status":200,"latency_ms":1234,...}
proxy_logger = JSONProxyLogger("proxy_requests.jsonl")
response = proxy_logger.fetch("https://example.com/data", country="us")
Periodische Gesundheitsberichte
Für langlaufende Schaber erzeugen periodische Gesundheitsberichte, die die Leistung über feste Fenster zusammenfassen.
import time
import threading
from datetime import datetime
class PeriodicReporter:
"""Generates periodic performance reports from proxy metrics."""
def __init__(self, metrics: ProxyMetrics, interval_seconds: int = 60):
self.metrics = metrics
self.interval = interval_seconds
self._running = False
self._thread = None
self._last_snapshot = None
def start(self):
self._running = True
self._last_snapshot = self._snapshot()
self._thread = threading.Thread(target=self._report_loop, daemon=True)
self._thread.start()
def stop(self):
self._running = False
def _snapshot(self) -> dict:
return {
"total": self.metrics.total_requests,
"success": self.metrics.successful,
"failed": self.metrics.failed,
"retries": self.metrics.retries,
"time": time.time(),
}
def _report_loop(self):
while self._running:
time.sleep(self.interval)
current = self._snapshot()
prev = self._last_snapshot
elapsed = current["time"] - prev["time"]
requests_delta = current["total"] - prev["total"]
success_delta = current["success"] - prev["success"]
failed_delta = current["failed"] - prev["failed"]
rps = requests_delta / elapsed if elapsed > 0 else 0
window_success_rate = (
(success_delta / requests_delta * 100)
if requests_delta > 0 else 0
)
report = {
"window": f"{self.interval}s",
"timestamp": datetime.utcnow().isoformat(),
"requests": requests_delta,
"rps": round(rps, 1),
"success_rate": f"{window_success_rate:.1f}%",
"failed": failed_delta,
"cumulative_success_rate": f"{self.metrics.success_rate:.1f}%",
"p95_latency": f"{self.metrics.p95_latency:.3f}s",
}
print(f"[REPORT] {report}")
self._last_snapshot = current
# Usage with MonitoredProxyClient
client = MonitoredProxyClient(max_retries=3)
reporter = PeriodicReporter(client.metrics, interval_seconds=30)
reporter.start()
# Scrape away — reports print every 30 seconds
for url in urls:
client.fetch(url)
reporter.stop()
Alarmregeln und Schwellen
Erstellen Sie intelligente Warnung, die falsche Positive während der Aufwärmperioden und transienten Blips vermeidet.
| Alarmstufe | Zustand | Kühlung | Aktion |
|---|---|---|---|
| Niedriger Erfolgspreis | Unter 90% über 5 Minuten Fenster | 10 min | Zielblöcke untersuchen, Proxy-Pool überprüfen |
| Hohe Latenz | p95 über 10s über 2-minütiges Fenster | 5 min | Konkurrenz reduzieren, Zielgesundheit überprüfen |
| Fehlersuche | Ein Fehlertyp überschreitet 20% der Anträge | 5 min | Prüfen Sie, ob das Ziel geändert wird, drehen Sie Geo-Standort |
| Bandbreite Spike | Transferrate verdoppelt von baseline | 15 min | Verifizieren Sie erwartetes Verhalten, überprüfen Sie umgeleitete Schleifen |
| Nulldurchgang | Keine erfolgreichen Anfragen in 2 Minuten | 2 min | Proxy-Konnektivität überprüfen, Anmeldeinformationen überprüfen |
Eine gute Überwachung ist der Unterschied zwischen einer seit Monaten zuverlässig ablaufenden Abstreifpipeline und einer, die Mülldaten stilllegt. Investieren Sie in Instrumentation vorwärts – es zahlt sich für sich auf den ersten Produktionsvorfall, den Sie früh fangen.
Zum Bau der Middleware, die diese Metriken füttert, siehe Aufbau einer Proxy Middleware Layer. Zur Optimierung des Durchsatzes neben der Überwachung lesen Scaling Proxy Anfragen mit Koncurrency Control. Für das komplette Systemdesign siehe Design einer zuverlässigen Scraping Architektur.
Entdecken Sie die Python SDK, Node SDK, und SDK zur Proxy-Integration oder Überprüfung Preise für ProxyHat und Dokumentation zu beginnen.






