NotesWhat is notes.io?

Notes brand slogan

Notes - notes.io

(.abhi2) [pfls@ip-172-28-12-89 output]$ # On the VM — find the latest report
cat $(ls -t /home/pfls/abhishek/src/output/latency_report_*.txt 2>/dev/null || ls -t output/latency_report_*.txt | head -1)
======================================================================
CREDIT-RISK LATENCY ANALYSIS & BENCHMARK REPORT: 8e8e2725-cb98-4796-bbfa-cc11b532a990
======================================================================
Process PID : 323792
Session Start : 2026-06-02T12:38:03.174913
End-to-End Latency : 160.37 seconds
Request Status : SUCCESS
Total Documents Processed : 15
----------------------------------------------------------------------
PIPELINE STAGE LATENCY SUMMARY:
- S3 / Local Download : 2.54s ( 1.6% of total time)
- Azure OCR Engine : 39.48s ( 24.6% of total time)
- Embedding Generation : 5.75s ( 3.6% of total time)
- Vector Retrieval : 4.02s ( 2.5% of total time)
- LLM Inference : 44.44s ( 27.7% of total time)

OTHER Global Orchestration Stages:
- Verification Pipeline Time Seconds: 0.03s
- Cibil Processing Time Seconds: 0.03s
- Final Pdf Generation Time Seconds: 0.20s
- S3 Upload Time Seconds : 0.09s
- Lsq Submission Time Seconds : 4.13s
----------------------------------------------------------------------
DETAILED PER-DOCUMENT TIMING SUMMARY:
* CA1/CIBIL (CIBIL.json):
- S3 Download : 0.11s
- Azure OCR : 0.00s (Pages: 0, Avg/Page: 0.00s)
- Embedding Gen : 0.00s (Chunks: 0)
- Retrieval Time : 0.00s
- LLM Inference : 0.00s (Load: 0.00s, Prompt Eval: 0.00s)
(Tokens: In=0, Out=0, speed=0.0 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 0.17s

* Details/Details (Activity.json):
- S3 Download : 0.10s
- Azure OCR : 0.00s (Pages: 0, Avg/Page: 0.00s)
- Embedding Gen : 0.00s (Chunks: 0)
- Retrieval Time : 0.00s
- LLM Inference : 0.00s (Load: 0.00s, Prompt Eval: 0.00s)
(Tokens: In=0, Out=0, speed=0.0 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 0.16s

* Details/Details (Lead.json):
- S3 Download : 0.04s
- Azure OCR : 0.00s (Pages: 0, Avg/Page: 0.00s)
- Embedding Gen : 0.00s (Chunks: 0)
- Retrieval Time : 0.00s
- LLM Inference : 0.00s (Load: 0.00s, Prompt Eval: 0.00s)
(Tokens: In=0, Out=0, speed=0.0 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 0.04s

* KP/CIBIL (CIBIL.json):
- S3 Download : 0.10s
- Azure OCR : 0.00s (Pages: 0, Avg/Page: 0.00s)
- Embedding Gen : 0.00s (Chunks: 0)
- Retrieval Time : 0.00s
- LLM Inference : 0.00s (Load: 0.00s, Prompt Eval: 0.00s)
(Tokens: In=0, Out=0, speed=0.0 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 0.15s

* MA/CIBIL (CIBIL.json):
- S3 Download : 0.09s
- Azure OCR : 0.00s (Pages: 0, Avg/Page: 0.00s)
- Embedding Gen : 0.00s (Chunks: 0)
- Retrieval Time : 0.00s
- LLM Inference : 0.00s (Load: 0.00s, Prompt Eval: 0.00s)
(Tokens: In=0, Out=0, speed=0.0 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 0.13s

* CA1/Address (Business Address Proof [1].pdf):
- S3 Download : 0.21s
- Azure OCR : 5.52s (Pages: 3, Avg/Page: 1.84s)
- Embedding Gen : 2.74s (Chunks: 4)
- Retrieval Time : 0.00s
- LLM Inference : 1.17s (Load: 0.00s, Prompt Eval: 1.13s)
(TTFT: 0.043s, GPU Util: 100.0%, VRAM: 20211.0/23028.0 MiB)
(Tokens: In=502, Out=25, speed=22.1 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 9.68s

* CA1/UDYAM certificate (UDYAM certificate [1].pdf):
- S3 Download : 0.21s
- Azure OCR : 3.37s (Pages: 5, Avg/Page: 0.67s)
- Embedding Gen : 0.60s (Chunks: 14)
- Retrieval Time : 3.82s
- LLM Inference : 3.09s (Load: 0.00s, Prompt Eval: 3.04s)
(TTFT: 0.042s, GPU Util: 100.0%, VRAM: 20700.0/23028.0 MiB)
(Tokens: In=1309, Out=78, speed=25.6 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 11.13s

* CA1/GST Registration Certificate (GST Registration Certificate [1].pdf):
- S3 Download : 0.19s
- Azure OCR : 3.28s (Pages: 3, Avg/Page: 1.09s)
- Embedding Gen : 0.25s (Chunks: 4)
- Retrieval Time : 0.00s
- LLM Inference : 2.53s (Load: 0.00s, Prompt Eval: 2.49s)
(TTFT: 0.041s, GPU Util: 100.0%, VRAM: 20700.0/23028.0 MiB)
(Tokens: In=839, Out=68, speed=27.3 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 6.29s

* KP/Aadhar Card (Aadhar Card [1].pdf):
- S3 Download : 0.20s
- Azure OCR : 3.41s (Pages: 2, Avg/Page: 1.71s)
- Embedding Gen : 0.37s (Chunks: 1)
- Retrieval Time : 0.00s
- LLM Inference : 1.97s (Load: 0.00s, Prompt Eval: 1.93s)
(TTFT: 0.041s, GPU Util: 100.0%, VRAM: 20700.0/23028.0 MiB)
(Tokens: In=1119, Out=41, speed=21.3 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 5.99s

* KP/Address (Key Person POA - Aadhar [1].pdf):
- S3 Download : 0.20s
- Azure OCR : 3.24s (Pages: 2, Avg/Page: 1.62s)
- Embedding Gen : 0.19s (Chunks: 1)
- Retrieval Time : 0.00s
- LLM Inference : 0.91s (Load: 0.00s, Prompt Eval: 0.87s)
(TTFT: 0.036s, GPU Util: 100.0%, VRAM: 20700.0/23028.0 MiB)
(Tokens: In=242, Out=19, speed=21.7 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 4.57s

* KP/Address (Key Person POA - Aadhar [2].pdf):
- S3 Download : 0.18s
- Azure OCR : 3.24s (Pages: 2, Avg/Page: 1.62s)
- Embedding Gen : 0.48s (Chunks: 5)
- Retrieval Time : 0.00s
- LLM Inference : 1.18s (Load: 0.00s, Prompt Eval: 1.14s)
(TTFT: 0.039s, GPU Util: 100.0%, VRAM: 20700.0/23028.0 MiB)
(Tokens: In=567, Out=25, speed=22.0 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 5.11s

* KP/PAN Card (PAN Card [1].pdf):
- S3 Download : 0.22s
- Azure OCR : 3.29s (Pages: 2, Avg/Page: 1.64s)
- Embedding Gen : 0.23s (Chunks: 3)
- Retrieval Time : 0.00s
- LLM Inference : 1.55s (Load: 0.00s, Prompt Eval: 1.51s)
(TTFT: 0.041s, GPU Util: 100.0%, VRAM: 20700.0/23028.0 MiB)
(Tokens: In=962, Out=27, speed=17.9 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 5.33s

* MA/Address (Business Address Proof [1].pdf):
- S3 Download : 0.23s
- Azure OCR : 3.34s (Pages: 3, Avg/Page: 1.11s)
- Embedding Gen : 0.24s (Chunks: 4)
- Retrieval Time : 0.00s
- LLM Inference : 1.17s (Load: 0.00s, Prompt Eval: 1.13s)
(TTFT: 0.038s, GPU Util: 100.0%, VRAM: 20700.0/23028.0 MiB)
(Tokens: In=502, Out=25, speed=22.1 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 5.02s

* MA/UDYAM certificate (UDYAM certificate [1].pdf):
- S3 Download : 0.23s
- Azure OCR : 7.47s (Pages: 5, Avg/Page: 1.49s)
- Embedding Gen : 0.40s (Chunks: 14)
- Retrieval Time : 0.20s
- LLM Inference : 28.34s (Load: 0.00s, Prompt Eval: 28.30s)
(TTFT: 0.042s, GPU Util: 92.0%, VRAM: 20700.0/23028.0 MiB)
(Tokens: In=1309, Out=385, speed=13.6 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 93.41s

* MA/GST Registration Certificate (GST Registration Certificate [1].pdf):
- S3 Download : 0.23s
- Azure OCR : 3.32s (Pages: 3, Avg/Page: 1.11s)
- Embedding Gen : 0.24s (Chunks: 4)
- Retrieval Time : 0.00s
- LLM Inference : 2.53s (Load: 0.00s, Prompt Eval: 2.49s)
(TTFT: 0.041s, GPU Util: 100.0%, VRAM: 20700.0/23028.0 MiB)
(Tokens: In=839, Out=68, speed=27.3 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 6.36s

----------------------------------------------------------------------
BOTTLENECK ANALYSIS & MIGRATION RECOMMENDATIONS:
- Slowest Document : MA/UDYAM certificate (93.41s)
- Primary Bottleneck : LLM Inference (44.44s, 27.7%)

[RECOMMENDATION] Optimize LLM Inference Layer:
- If using Ollama, consider replacing it with vLLM which supports Continuous Batching and PagedAttention, increasing throughput by 3x-10x.
- If already using vLLM, optimize using tensor parallelism, model quantization (AWQ/GPTQ), or by adjusting VRAM constraints.
======================================================================
======================================================================
CREDIT-RISK LATENCY ANALYSIS & BENCHMARK REPORT: 1590863d-c7eb-4730-bb29-9adb934f2e0e
======================================================================
Process PID : 321173
Session Start : 2026-06-02T12:25:54.308650
End-to-End Latency : 163.71 seconds
Request Status : SUCCESS
Total Documents Processed : 15
----------------------------------------------------------------------
PIPELINE STAGE LATENCY SUMMARY:
- S3 / Local Download : 2.58s ( 1.6% of total time)
- Azure OCR Engine : 35.35s ( 21.6% of total time)
- Embedding Generation : 6.41s ( 3.9% of total time)
- Vector Retrieval : 4.07s ( 2.5% of total time)
- LLM Inference : 108.17s ( 66.1% of total time)

OTHER Global Orchestration Stages:
- Verification Pipeline Time Seconds: 0.03s
- Cibil Processing Time Seconds: 0.03s
- Final Pdf Generation Time Seconds: 0.20s
- S3 Upload Time Seconds : 0.10s
- Lsq Submission Time Seconds : 3.89s
----------------------------------------------------------------------
DETAILED PER-DOCUMENT TIMING SUMMARY:
* CA1/CIBIL (CIBIL.json):
- S3 Download : 0.11s
- Azure OCR : 0.00s (Pages: 0, Avg/Page: 0.00s)
- Embedding Gen : 0.00s (Chunks: 0)
- Retrieval Time : 0.00s
- LLM Inference : 0.00s (Load: 0.00s, Prompt Eval: 0.00s)
(Tokens: In=0, Out=0, speed=0.0 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 0.16s

* Details/Details (Activity.json):
- S3 Download : 0.10s
- Azure OCR : 0.00s (Pages: 0, Avg/Page: 0.00s)
- Embedding Gen : 0.00s (Chunks: 0)
- Retrieval Time : 0.00s
- LLM Inference : 0.00s (Load: 0.00s, Prompt Eval: 0.00s)
(Tokens: In=0, Out=0, speed=0.0 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 0.16s

* Details/Details (Lead.json):
- S3 Download : 0.04s
- Azure OCR : 0.00s (Pages: 0, Avg/Page: 0.00s)
- Embedding Gen : 0.00s (Chunks: 0)
- Retrieval Time : 0.00s
- LLM Inference : 0.00s (Load: 0.00s, Prompt Eval: 0.00s)
(Tokens: In=0, Out=0, speed=0.0 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 0.04s

* KP/CIBIL (CIBIL.json):
- S3 Download : 0.09s
- Azure OCR : 0.00s (Pages: 0, Avg/Page: 0.00s)
- Embedding Gen : 0.00s (Chunks: 0)
- Retrieval Time : 0.00s
- LLM Inference : 0.00s (Load: 0.00s, Prompt Eval: 0.00s)
(Tokens: In=0, Out=0, speed=0.0 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 0.13s

* MA/CIBIL (CIBIL.json):
- S3 Download : 0.12s
- Azure OCR : 0.00s (Pages: 0, Avg/Page: 0.00s)
- Embedding Gen : 0.00s (Chunks: 0)
- Retrieval Time : 0.00s
- LLM Inference : 0.00s (Load: 0.00s, Prompt Eval: 0.00s)
(Tokens: In=0, Out=0, speed=0.0 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 0.16s

* CA1/Address (Business Address Proof [1].pdf):
- S3 Download : 0.46s
- Azure OCR : 3.40s (Pages: 3, Avg/Page: 1.13s)
- Embedding Gen : 3.23s (Chunks: 4)
- Retrieval Time : 0.00s
- LLM Inference : 1.17s (Load: 0.00s, Prompt Eval: 1.13s)
(TTFT: 0.042s, GPU Util: 100.0%, VRAM: 20211.0/23028.0 MiB)
(Tokens: In=502, Out=25, speed=22.1 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 8.30s

* CA1/UDYAM certificate (UDYAM certificate [1].pdf):
- S3 Download : 0.20s
- Azure OCR : 3.38s (Pages: 5, Avg/Page: 0.68s)
- Embedding Gen : 0.78s (Chunks: 14)
- Retrieval Time : 3.86s
- LLM Inference : 47.48s (Load: 0.00s, Prompt Eval: 47.44s)
(TTFT: 0.042s, GPU Util: 87.0%, VRAM: 20700.0/23028.0 MiB)
(Tokens: In=1309, Out=591, speed=12.5 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 55.74s

* CA1/GST Registration Certificate (GST Registration Certificate [1].pdf):
- S3 Download : 0.18s
- Azure OCR : 3.34s (Pages: 3, Avg/Page: 1.11s)
- Embedding Gen : 0.24s (Chunks: 4)
- Retrieval Time : 0.00s
- LLM Inference : 2.53s (Load: 0.00s, Prompt Eval: 2.49s)
(TTFT: 0.041s, GPU Util: 100.0%, VRAM: 20700.0/23028.0 MiB)
(Tokens: In=839, Out=68, speed=27.3 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 6.32s

* KP/Aadhar Card (Aadhar Card [1].pdf):
- S3 Download : 0.16s
- Azure OCR : 3.29s (Pages: 2, Avg/Page: 1.65s)
- Embedding Gen : 0.38s (Chunks: 1)
- Retrieval Time : 0.00s
- LLM Inference : 1.97s (Load: 0.00s, Prompt Eval: 1.93s)
(TTFT: 0.041s, GPU Util: 100.0%, VRAM: 20700.0/23028.0 MiB)
(Tokens: In=1119, Out=41, speed=21.3 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 5.84s

* KP/Address (Key Person POA - Aadhar [1].pdf):
- S3 Download : 0.23s
- Azure OCR : 3.34s (Pages: 2, Avg/Page: 1.67s)
- Embedding Gen : 0.19s (Chunks: 1)
- Retrieval Time : 0.00s
- LLM Inference : 1.12s (Load: 0.00s, Prompt Eval: 1.08s)
(TTFT: 0.036s, GPU Util: 100.0%, VRAM: 20700.0/23028.0 MiB)
(Tokens: In=242, Out=25, speed=23.1 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 4.91s

* KP/Address (Key Person POA - Aadhar [2].pdf):
- S3 Download : 0.15s
- Azure OCR : 3.25s (Pages: 2, Avg/Page: 1.62s)
- Embedding Gen : 0.44s (Chunks: 5)
- Retrieval Time : 0.00s
- LLM Inference : 1.18s (Load: 0.00s, Prompt Eval: 1.14s)
(TTFT: 0.040s, GPU Util: 100.0%, VRAM: 20700.0/23028.0 MiB)
(Tokens: In=567, Out=25, speed=22.0 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 5.05s

* KP/PAN Card (PAN Card [1].pdf):
- S3 Download : 0.18s
- Azure OCR : 3.24s (Pages: 2, Avg/Page: 1.62s)
- Embedding Gen : 0.23s (Chunks: 3)
- Retrieval Time : 0.00s
- LLM Inference : 1.55s (Load: 0.00s, Prompt Eval: 1.51s)
(TTFT: 0.041s, GPU Util: 100.0%, VRAM: 20700.0/23028.0 MiB)
(Tokens: In=962, Out=27, speed=17.9 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 5.23s

* MA/Address (Business Address Proof [1].pdf):
- S3 Download : 0.19s
- Azure OCR : 3.30s (Pages: 3, Avg/Page: 1.10s)
- Embedding Gen : 0.26s (Chunks: 4)
- Retrieval Time : 0.00s
- LLM Inference : 1.17s (Load: 0.00s, Prompt Eval: 1.13s)
(TTFT: 0.039s, GPU Util: 100.0%, VRAM: 20700.0/23028.0 MiB)
(Tokens: In=502, Out=25, speed=22.1 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 4.96s

* MA/UDYAM certificate (UDYAM certificate [1].pdf):
- S3 Download : 0.19s
- Azure OCR : 4.47s (Pages: 5, Avg/Page: 0.89s)
- Embedding Gen : 0.42s (Chunks: 14)
- Retrieval Time : 0.21s
- LLM Inference : 47.47s (Load: 0.00s, Prompt Eval: 47.43s)
(TTFT: 0.041s, GPU Util: 100.0%, VRAM: 20700.0/23028.0 MiB)
(Tokens: In=1309, Out=599, speed=12.6 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 52.81s

* MA/GST Registration Certificate (GST Registration Certificate [1].pdf):
- S3 Download : 0.20s
- Azure OCR : 4.34s (Pages: 3, Avg/Page: 1.45s)
- Embedding Gen : 0.24s (Chunks: 4)
- Retrieval Time : 0.00s
- LLM Inference : 2.53s (Load: 0.00s, Prompt Eval: 2.49s)
(TTFT: 0.041s, GPU Util: 100.0%, VRAM: 20700.0/23028.0 MiB)
(Tokens: In=839, Out=68, speed=27.3 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 7.35s

----------------------------------------------------------------------
BOTTLENECK ANALYSIS & MIGRATION RECOMMENDATIONS:
- Slowest Document : CA1/UDYAM certificate (55.74s)
- Primary Bottleneck : LLM Inference (108.17s, 66.1%)

[RECOMMENDATION] Optimize LLM Inference Layer:
- If using Ollama, consider replacing it with vLLM which supports Continuous Batching and PagedAttention, increasing throughput by 3x-10x.
- If already using vLLM, optimize using tensor parallelism, model quantization (AWQ/GPTQ), or by adjusting VRAM constraints.
======================================================================
======================================================================
CREDIT-RISK LATENCY ANALYSIS & BENCHMARK REPORT: 7cfca04c-e838-4100-9bd5-ef0ed51888c6
======================================================================
Process PID : 319094
Session Start : 2026-06-02T12:15:33.210421
End-to-End Latency : 217.77 seconds
Request Status : SUCCESS
Total Documents Processed : 15
----------------------------------------------------------------------
PIPELINE STAGE LATENCY SUMMARY:
- S3 / Local Download : 2.47s ( 1.1% of total time)
- Azure OCR Engine : 39.25s ( 18.0% of total time)
- Embedding Generation : 6.01s ( 2.8% of total time)
- Vector Retrieval : 3.98s ( 1.8% of total time)
- LLM Inference : 63.76s ( 29.3% of total time)

OTHER Global Orchestration Stages:
- Verification Pipeline Time Seconds: 0.03s
- Cibil Processing Time Seconds: 0.03s
- Final Pdf Generation Time Seconds: 0.20s
- S3 Upload Time Seconds : 0.09s
- Lsq Submission Time Seconds : 3.95s
----------------------------------------------------------------------
DETAILED PER-DOCUMENT TIMING SUMMARY:
* CA1/CIBIL (CIBIL.json):
- S3 Download : 0.12s
- Azure OCR : 0.00s (Pages: 0, Avg/Page: 0.00s)
- Embedding Gen : 0.00s (Chunks: 0)
- Retrieval Time : 0.00s
- LLM Inference : 0.00s (Load: 0.00s, Prompt Eval: 0.00s)
(Tokens: In=0, Out=0, speed=0.0 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 0.17s

* Details/Details (Activity.json):
- S3 Download : 0.08s
- Azure OCR : 0.00s (Pages: 0, Avg/Page: 0.00s)
- Embedding Gen : 0.00s (Chunks: 0)
- Retrieval Time : 0.00s
- LLM Inference : 0.00s (Load: 0.00s, Prompt Eval: 0.00s)
(Tokens: In=0, Out=0, speed=0.0 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 0.14s

* Details/Details (Lead.json):
- S3 Download : 0.05s
- Azure OCR : 0.00s (Pages: 0, Avg/Page: 0.00s)
- Embedding Gen : 0.00s (Chunks: 0)
- Retrieval Time : 0.00s
- LLM Inference : 0.00s (Load: 0.00s, Prompt Eval: 0.00s)
(Tokens: In=0, Out=0, speed=0.0 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 0.05s

* KP/CIBIL (CIBIL.json):
- S3 Download : 0.10s
- Azure OCR : 0.00s (Pages: 0, Avg/Page: 0.00s)
- Embedding Gen : 0.00s (Chunks: 0)
- Retrieval Time : 0.00s
- LLM Inference : 0.00s (Load: 0.00s, Prompt Eval: 0.00s)
(Tokens: In=0, Out=0, speed=0.0 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 0.14s

* MA/CIBIL (CIBIL.json):
- S3 Download : 0.09s
- Azure OCR : 0.00s (Pages: 0, Avg/Page: 0.00s)
- Embedding Gen : 0.00s (Chunks: 0)
- Retrieval Time : 0.00s
- LLM Inference : 0.00s (Load: 0.00s, Prompt Eval: 0.00s)
(Tokens: In=0, Out=0, speed=0.0 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 0.14s

* CA1/Address (Business Address Proof [1].pdf):
- S3 Download : 0.17s
- Azure OCR : 3.39s (Pages: 3, Avg/Page: 1.13s)
- Embedding Gen : 2.85s (Chunks: 4)
- Retrieval Time : 0.00s
- LLM Inference : 1.17s (Load: 0.00s, Prompt Eval: 1.13s)
(TTFT: 0.042s, GPU Util: 100.0%, VRAM: 20211.0/23028.0 MiB)
(Tokens: In=694, Out=50, speed=44.2 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 7.62s

* CA1/UDYAM certificate (UDYAM certificate [1].pdf):
- S3 Download : 0.25s
- Azure OCR : 4.44s (Pages: 5, Avg/Page: 0.89s)
- Embedding Gen : 0.77s (Chunks: 14)
- Retrieval Time : 3.78s
- LLM Inference : 47.48s (Load: 0.00s, Prompt Eval: 47.43s)
(TTFT: 0.042s, GPU Util: 100.0%, VRAM: 20700.0/23028.0 MiB)
(Tokens: In=1592, Out=2000, speed=42.2 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 104.26s

* CA1/GST Registration Certificate (GST Registration Certificate [1].pdf):
- S3 Download : 0.22s
- Azure OCR : 3.37s (Pages: 3, Avg/Page: 1.12s)
- Embedding Gen : 0.23s (Chunks: 4)
- Retrieval Time : 0.00s
- LLM Inference : 2.53s (Load: 0.00s, Prompt Eval: 2.49s)
(TTFT: 0.043s, GPU Util: 100.0%, VRAM: 20700.0/23028.0 MiB)
(Tokens: In=1020, Out=108, speed=43.4 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 6.39s

* KP/Aadhar Card (Aadhar Card [1].pdf):
- S3 Download : 0.17s
- Azure OCR : 3.25s (Pages: 2, Avg/Page: 1.62s)
- Embedding Gen : 0.38s (Chunks: 1)
- Retrieval Time : 0.00s
- LLM Inference : 1.97s (Load: 0.00s, Prompt Eval: 1.93s)
(TTFT: 0.042s, GPU Util: 100.0%, VRAM: 20700.0/23028.0 MiB)
(Tokens: In=1203, Out=84, speed=43.5 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 5.81s

* KP/Address (Key Person POA - Aadhar [1].pdf):
- S3 Download : 0.17s
- Azure OCR : 5.33s (Pages: 2, Avg/Page: 2.67s)
- Embedding Gen : 0.19s (Chunks: 1)
- Retrieval Time : 0.00s
- LLM Inference : 1.09s (Load: 0.00s, Prompt Eval: 1.06s)
(TTFT: 0.036s, GPU Util: 100.0%, VRAM: 20700.0/23028.0 MiB)
(Tokens: In=396, Out=47, speed=44.4 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 6.82s

* KP/Address (Key Person POA - Aadhar [2].pdf):
- S3 Download : 0.19s
- Azure OCR : 5.28s (Pages: 2, Avg/Page: 2.64s)
- Embedding Gen : 0.45s (Chunks: 5)
- Retrieval Time : 0.00s
- LLM Inference : 1.18s (Load: 0.00s, Prompt Eval: 1.14s)
(TTFT: 0.039s, GPU Util: 100.0%, VRAM: 20700.0/23028.0 MiB)
(Tokens: In=1107, Out=50, speed=43.9 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 7.14s

* KP/PAN Card (PAN Card [1].pdf):
- S3 Download : 0.21s
- Azure OCR : 3.23s (Pages: 2, Avg/Page: 1.62s)
- Embedding Gen : 0.24s (Chunks: 3)
- Retrieval Time : 0.00s
- LLM Inference : 1.55s (Load: 0.00s, Prompt Eval: 1.51s)
(TTFT: 0.041s, GPU Util: 100.0%, VRAM: 20700.0/23028.0 MiB)
(Tokens: In=1110, Out=66, speed=43.7 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 5.27s

* MA/Address (Business Address Proof [1].pdf):
- S3 Download : 0.22s
- Azure OCR : 3.27s (Pages: 3, Avg/Page: 1.09s)
- Embedding Gen : 0.26s (Chunks: 4)
- Retrieval Time : 0.00s
- LLM Inference : 1.17s (Load: 0.00s, Prompt Eval: 1.13s)
(TTFT: 0.039s, GPU Util: 100.0%, VRAM: 20700.0/23028.0 MiB)
(Tokens: In=694, Out=50, speed=44.2 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 4.97s

* MA/UDYAM certificate (UDYAM certificate [1].pdf):
- S3 Download : 0.18s
- Azure OCR : 4.39s (Pages: 5, Avg/Page: 0.88s)
- Embedding Gen : 0.40s (Chunks: 14)
- Retrieval Time : 0.20s
- LLM Inference : 3.09s (Load: 0.00s, Prompt Eval: 3.04s)
(TTFT: 0.042s, GPU Util: 100.0%, VRAM: 20700.0/23028.0 MiB)
(Tokens: In=1592, Out=131, speed=43.0 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 55.80s

* MA/GST Registration Certificate (GST Registration Certificate [1].pdf):
- S3 Download : 0.24s
- Azure OCR : 3.29s (Pages: 3, Avg/Page: 1.10s)
- Embedding Gen : 0.24s (Chunks: 4)
- Retrieval Time : 0.00s
- LLM Inference : 2.53s (Load: 0.00s, Prompt Eval: 2.49s)
(TTFT: 0.041s, GPU Util: 100.0%, VRAM: 20702.0/23028.0 MiB)
(Tokens: In=1020, Out=108, speed=43.4 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 6.34s

----------------------------------------------------------------------
BOTTLENECK ANALYSIS & MIGRATION RECOMMENDATIONS:
- Slowest Document : CA1/UDYAM certificate (104.26s)
- Primary Bottleneck : LLM Inference (63.76s, 29.3%)

[RECOMMENDATION] Optimize LLM Inference Layer:
- If using Ollama, consider replacing it with vLLM which supports Continuous Batching and PagedAttention, increasing throughput by 3x-10x.
- If already using vLLM, optimize using tensor parallelism, model quantization (AWQ/GPTQ), or by adjusting VRAM constraints.
======================================================================
======================================================================
CREDIT-RISK LATENCY ANALYSIS & BENCHMARK REPORT: 874e915c-5d9d-4dce-83b4-0ece2de605a3
======================================================================
Process PID : 316494
Session Start : 2026-06-02T12:04:43.387534
End-to-End Latency : 119.77 seconds
Request Status : SUCCESS
Total Documents Processed : 15
----------------------------------------------------------------------
PIPELINE STAGE LATENCY SUMMARY:
- S3 / Local Download : 2.55s ( 2.1% of total time)
- Azure OCR Engine : 39.58s ( 33.0% of total time)
- Embedding Generation : 2.65s ( 2.2% of total time)
- Vector Retrieval : 4.18s ( 3.5% of total time)
- LLM Inference : 19.23s ( 16.1% of total time)

OTHER Global Orchestration Stages:
- Verification Pipeline Time Seconds: 0.03s
- Cibil Processing Time Seconds: 0.03s
- Final Pdf Generation Time Seconds: 0.19s
- S3 Upload Time Seconds : 0.10s
- Lsq Submission Time Seconds : 0.69s
----------------------------------------------------------------------
DETAILED PER-DOCUMENT TIMING SUMMARY:
* CA1/CIBIL (CIBIL.json):
- S3 Download : 0.13s
- Azure OCR : 0.00s (Pages: 0, Avg/Page: 0.00s)
- Embedding Gen : 0.00s (Chunks: 0)
- Retrieval Time : 0.00s
- LLM Inference : 0.00s (Load: 0.00s, Prompt Eval: 0.00s)
(Tokens: In=0, Out=0, speed=0.0 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 0.18s

* Details/Details (Activity.json):
- S3 Download : 0.10s
- Azure OCR : 0.00s (Pages: 0, Avg/Page: 0.00s)
- Embedding Gen : 0.00s (Chunks: 0)
- Retrieval Time : 0.00s
- LLM Inference : 0.00s (Load: 0.00s, Prompt Eval: 0.00s)
(Tokens: In=0, Out=0, speed=0.0 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 0.17s

* Details/Details (Lead.json):
- S3 Download : 0.06s
- Azure OCR : 0.00s (Pages: 0, Avg/Page: 0.00s)
- Embedding Gen : 0.00s (Chunks: 0)
- Retrieval Time : 0.00s
- LLM Inference : 0.00s (Load: 0.00s, Prompt Eval: 0.00s)
(Tokens: In=0, Out=0, speed=0.0 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 0.06s

* KP/CIBIL (CIBIL.json):
- S3 Download : 0.11s
- Azure OCR : 0.00s (Pages: 0, Avg/Page: 0.00s)
- Embedding Gen : 0.00s (Chunks: 0)
- Retrieval Time : 0.00s
- LLM Inference : 0.00s (Load: 0.00s, Prompt Eval: 0.00s)
(Tokens: In=0, Out=0, speed=0.0 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 0.16s

* MA/CIBIL (CIBIL.json):
- S3 Download : 0.09s
- Azure OCR : 0.00s (Pages: 0, Avg/Page: 0.00s)
- Embedding Gen : 0.00s (Chunks: 0)
- Retrieval Time : 0.00s
- LLM Inference : 0.00s (Load: 0.00s, Prompt Eval: 0.00s)
(Tokens: In=0, Out=0, speed=0.0 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 0.13s

* CA1/Address (Business Address Proof [1].pdf):
- S3 Download : 0.18s
- Azure OCR : 3.42s (Pages: 3, Avg/Page: 1.14s)
- Embedding Gen : 0.28s (Chunks: 4)
- Retrieval Time : 0.00s
- LLM Inference : 1.18s (Load: 0.00s, Prompt Eval: 1.13s)
(TTFT: 0.043s, GPU Util: 100.0%, VRAM: 20213.0/23028.0 MiB)
(Tokens: In=694, Out=50, speed=44.2 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 5.09s

* CA1/UDYAM certificate (UDYAM certificate [1].pdf):
- S3 Download : 0.24s
- Azure OCR : 4.47s (Pages: 5, Avg/Page: 0.89s)
- Embedding Gen : 0.40s (Chunks: 14)
- Retrieval Time : 3.97s
- LLM Inference : 2.92s (Load: 0.00s, Prompt Eval: 2.88s)
(TTFT: 0.042s, GPU Util: 100.0%, VRAM: 20702.0/23028.0 MiB)
(Tokens: In=1592, Out=124, speed=43.1 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 59.54s

* CA1/GST Registration Certificate (GST Registration Certificate [1].pdf):
- S3 Download : 0.25s
- Azure OCR : 3.33s (Pages: 3, Avg/Page: 1.11s)
- Embedding Gen : 0.25s (Chunks: 4)
- Retrieval Time : 0.00s
- LLM Inference : 2.53s (Load: 0.00s, Prompt Eval: 2.49s)
(TTFT: 0.041s, GPU Util: 100.0%, VRAM: 20702.0/23028.0 MiB)
(Tokens: In=1020, Out=108, speed=43.4 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 6.40s

* KP/Aadhar Card (Aadhar Card [1].pdf):
- S3 Download : 0.18s
- Azure OCR : 3.33s (Pages: 2, Avg/Page: 1.66s)
- Embedding Gen : 0.19s (Chunks: 1)
- Retrieval Time : 0.00s
- LLM Inference : 1.97s (Load: 0.00s, Prompt Eval: 1.93s)
(TTFT: 0.041s, GPU Util: 100.0%, VRAM: 20702.0/23028.0 MiB)
(Tokens: In=1203, Out=84, speed=43.5 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 5.70s

* KP/Address (Key Person POA - Aadhar [1].pdf):
- S3 Download : 0.21s
- Azure OCR : 3.24s (Pages: 2, Avg/Page: 1.62s)
- Embedding Gen : 0.19s (Chunks: 1)
- Retrieval Time : 0.00s
- LLM Inference : 1.12s (Load: 0.00s, Prompt Eval: 1.08s)
(TTFT: 0.036s, GPU Util: 100.0%, VRAM: 20702.0/23028.0 MiB)
(Tokens: In=396, Out=48, speed=44.4 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 4.80s

* KP/Address (Key Person POA - Aadhar [2].pdf):
- S3 Download : 0.15s
- Azure OCR : 4.28s (Pages: 2, Avg/Page: 2.14s)
- Embedding Gen : 0.25s (Chunks: 5)
- Retrieval Time : 0.00s
- LLM Inference : 1.18s (Load: 0.00s, Prompt Eval: 1.14s)
(TTFT: 0.039s, GPU Util: 100.0%, VRAM: 20702.0/23028.0 MiB)
(Tokens: In=1107, Out=50, speed=43.9 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 5.89s

* KP/PAN Card (PAN Card [1].pdf):
- S3 Download : 0.20s
- Azure OCR : 4.41s (Pages: 2, Avg/Page: 2.21s)
- Embedding Gen : 0.22s (Chunks: 3)
- Retrieval Time : 0.00s
- LLM Inference : 1.55s (Load: 0.00s, Prompt Eval: 1.51s)
(TTFT: 0.041s, GPU Util: 100.0%, VRAM: 20702.0/23028.0 MiB)
(Tokens: In=1110, Out=66, speed=43.7 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 6.43s

* MA/Address (Business Address Proof [1].pdf):
- S3 Download : 0.21s
- Azure OCR : 4.31s (Pages: 3, Avg/Page: 1.44s)
- Embedding Gen : 0.24s (Chunks: 4)
- Retrieval Time : 0.00s
- LLM Inference : 1.17s (Load: 0.00s, Prompt Eval: 1.13s)
(TTFT: 0.038s, GPU Util: 97.0%, VRAM: 20702.0/23028.0 MiB)
(Tokens: In=694, Out=50, speed=44.2 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 5.97s

* MA/UDYAM certificate (UDYAM certificate [1].pdf):
- S3 Download : 0.21s
- Azure OCR : 3.39s (Pages: 5, Avg/Page: 0.68s)
- Embedding Gen : 0.39s (Chunks: 14)
- Retrieval Time : 0.21s
- LLM Inference : 3.09s (Load: 0.00s, Prompt Eval: 3.04s)
(TTFT: 0.042s, GPU Util: 100.0%, VRAM: 20702.0/23028.0 MiB)
(Tokens: In=1592, Out=131, speed=43.0 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 7.32s

* MA/GST Registration Certificate (GST Registration Certificate [1].pdf):
- S3 Download : 0.21s
- Azure OCR : 5.42s (Pages: 3, Avg/Page: 1.81s)
- Embedding Gen : 0.25s (Chunks: 4)
- Retrieval Time : 0.00s
- LLM Inference : 2.53s (Load: 0.00s, Prompt Eval: 2.49s)
(TTFT: 0.042s, GPU Util: 100.0%, VRAM: 20702.0/23028.0 MiB)
(Tokens: In=1020, Out=108, speed=43.4 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 8.44s

----------------------------------------------------------------------
BOTTLENECK ANALYSIS & MIGRATION RECOMMENDATIONS:
- Slowest Document : CA1/UDYAM certificate (59.54s)
- Primary Bottleneck : Azure OCR Engine (39.58s, 33.0%)

[RECOMMENDATION] Optimize Azure Document Intelligence:
- Use the 'prebuilt-read' model instead of 'prebuilt-layout' for simple structured documents (Aadhaar, PAN) as it is twice as fast.
- If document lengths are high, run asynchronous table/layout extractions in parallel worker pools to hide polling delays.
======================================================================
======================================================================
CREDIT-RISK LATENCY ANALYSIS & BENCHMARK REPORT: e6f026c3-92af-4911-b7ca-0442866f68e7
======================================================================
Process PID : 313486
Session Start : 2026-06-02T12:01:37.002952
End-to-End Latency : 115.05 seconds
Request Status : SUCCESS
Total Documents Processed : 15
----------------------------------------------------------------------
PIPELINE STAGE LATENCY SUMMARY:
- S3 / Local Download : 2.61s ( 2.3% of total time)
- Azure OCR Engine : 35.37s ( 30.7% of total time)
- Embedding Generation : 5.51s ( 4.8% of total time)
- Vector Retrieval : 4.16s ( 3.6% of total time)
- LLM Inference : 63.60s ( 55.3% of total time)

OTHER Global Orchestration Stages:
- Verification Pipeline Time Seconds: 0.03s
- Cibil Processing Time Seconds: 0.03s
- Final Pdf Generation Time Seconds: 0.20s
- S3 Upload Time Seconds : 0.10s
- Lsq Submission Time Seconds : 0.61s
----------------------------------------------------------------------
DETAILED PER-DOCUMENT TIMING SUMMARY:
* CA1/CIBIL (CIBIL.json):
- S3 Download : 0.14s
- Azure OCR : 0.00s (Pages: 0, Avg/Page: 0.00s)
- Embedding Gen : 0.00s (Chunks: 0)
- Retrieval Time : 0.00s
- LLM Inference : 0.00s (Load: 0.00s, Prompt Eval: 0.00s)
(Tokens: In=0, Out=0, speed=0.0 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 0.20s

* Details/Details (Activity.json):
- S3 Download : 0.09s
- Azure OCR : 0.00s (Pages: 0, Avg/Page: 0.00s)
- Embedding Gen : 0.00s (Chunks: 0)
- Retrieval Time : 0.00s
- LLM Inference : 0.00s (Load: 0.00s, Prompt Eval: 0.00s)
(Tokens: In=0, Out=0, speed=0.0 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 0.15s

* Details/Details (Lead.json):
- S3 Download : 0.03s
- Azure OCR : 0.00s (Pages: 0, Avg/Page: 0.00s)
- Embedding Gen : 0.00s (Chunks: 0)
- Retrieval Time : 0.00s
- LLM Inference : 0.00s (Load: 0.00s, Prompt Eval: 0.00s)
(Tokens: In=0, Out=0, speed=0.0 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 0.03s

* KP/CIBIL (CIBIL.json):
- S3 Download : 0.10s
- Azure OCR : 0.00s (Pages: 0, Avg/Page: 0.00s)
- Embedding Gen : 0.00s (Chunks: 0)
- Retrieval Time : 0.00s
- LLM Inference : 0.00s (Load: 0.00s, Prompt Eval: 0.00s)
(Tokens: In=0, Out=0, speed=0.0 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 0.14s

* MA/CIBIL (CIBIL.json):
- S3 Download : 0.09s
- Azure OCR : 0.00s (Pages: 0, Avg/Page: 0.00s)
- Embedding Gen : 0.00s (Chunks: 0)
- Retrieval Time : 0.00s
- LLM Inference : 0.00s (Load: 0.00s, Prompt Eval: 0.00s)
(Tokens: In=0, Out=0, speed=0.0 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 0.13s

* CA1/Address (Business Address Proof [1].pdf):
- S3 Download : 0.22s
- Azure OCR : 3.47s (Pages: 3, Avg/Page: 1.16s)
- Embedding Gen : 2.59s (Chunks: 4)
- Retrieval Time : 0.00s
- LLM Inference : 1.18s (Load: 0.00s, Prompt Eval: 1.13s)
(TTFT: 0.047s, GPU Util: 100.0%, VRAM: 20700.0/23028.0 MiB)
(Tokens: In=694, Out=50, speed=44.2 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 7.50s

* CA1/UDYAM certificate (UDYAM certificate [1].pdf):
- S3 Download : 0.23s
- Azure OCR : 4.42s (Pages: 5, Avg/Page: 0.88s)
- Embedding Gen : 0.57s (Chunks: 14)
- Retrieval Time : 3.95s
- LLM Inference : 3.11s (Load: 0.00s, Prompt Eval: 3.07s)
(TTFT: 0.042s, GPU Util: 100.0%, VRAM: 21190.0/23028.0 MiB)
(Tokens: In=1592, Out=132, speed=43.0 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 12.32s

* CA1/GST Registration Certificate (GST Registration Certificate [1].pdf):
- S3 Download : 0.23s
- Azure OCR : 3.38s (Pages: 3, Avg/Page: 1.13s)
- Embedding Gen : 0.23s (Chunks: 4)
- Retrieval Time : 0.00s
- LLM Inference : 2.53s (Load: 0.00s, Prompt Eval: 2.49s)
(TTFT: 0.041s, GPU Util: 100.0%, VRAM: 21190.0/23028.0 MiB)
(Tokens: In=1020, Out=108, speed=43.4 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 6.40s

* KP/Aadhar Card (Aadhar Card [1].pdf):
- S3 Download : 0.22s
- Azure OCR : 3.31s (Pages: 2, Avg/Page: 1.66s)
- Embedding Gen : 0.37s (Chunks: 1)
- Retrieval Time : 0.00s
- LLM Inference : 1.97s (Load: 0.00s, Prompt Eval: 1.93s)
(TTFT: 0.042s, GPU Util: 100.0%, VRAM: 21190.0/23028.0 MiB)
(Tokens: In=1203, Out=84, speed=43.5 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 5.92s

* KP/Address (Key Person POA - Aadhar [1].pdf):
- S3 Download : 0.23s
- Azure OCR : 3.22s (Pages: 2, Avg/Page: 1.61s)
- Embedding Gen : 0.19s (Chunks: 1)
- Retrieval Time : 0.00s
- LLM Inference : 0.91s (Load: 0.00s, Prompt Eval: 0.87s)
(TTFT: 0.036s, GPU Util: 100.0%, VRAM: 21190.0/23028.0 MiB)
(Tokens: In=396, Out=39, speed=44.6 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 4.59s

* KP/Address (Key Person POA - Aadhar [2].pdf):
- S3 Download : 0.16s
- Azure OCR : 4.28s (Pages: 2, Avg/Page: 2.14s)
- Embedding Gen : 0.47s (Chunks: 5)
- Retrieval Time : 0.00s
- LLM Inference : 1.18s (Load: 0.00s, Prompt Eval: 1.14s)
(TTFT: 0.039s, GPU Util: 100.0%, VRAM: 21190.0/23028.0 MiB)
(Tokens: In=1107, Out=50, speed=43.9 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 6.11s

* KP/PAN Card (PAN Card [1].pdf):
- S3 Download : 0.18s
- Azure OCR : 3.29s (Pages: 2, Avg/Page: 1.64s)
- Embedding Gen : 0.22s (Chunks: 3)
- Retrieval Time : 0.00s
- LLM Inference : 1.55s (Load: 0.00s, Prompt Eval: 1.51s)
(TTFT: 0.041s, GPU Util: 100.0%, VRAM: 21190.0/23028.0 MiB)
(Tokens: In=1110, Out=66, speed=43.7 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 5.27s

* MA/Address (Business Address Proof [1].pdf):
- S3 Download : 0.20s
- Azure OCR : 3.26s (Pages: 3, Avg/Page: 1.09s)
- Embedding Gen : 0.23s (Chunks: 4)
- Retrieval Time : 0.00s
- LLM Inference : 1.17s (Load: 0.00s, Prompt Eval: 1.13s)
(TTFT: 0.040s, GPU Util: 100.0%, VRAM: 21190.0/23028.0 MiB)
(Tokens: In=694, Out=50, speed=44.2 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 4.90s

* MA/UDYAM certificate (UDYAM certificate [1].pdf):
- S3 Download : 0.26s
- Azure OCR : 3.41s (Pages: 5, Avg/Page: 0.68s)
- Embedding Gen : 0.41s (Chunks: 14)
- Retrieval Time : 0.21s
- LLM Inference : 47.48s (Load: 0.00s, Prompt Eval: 47.44s)
(TTFT: 0.043s, GPU Util: 100.0%, VRAM: 21192.0/23028.0 MiB)
(Tokens: In=1592, Out=2000, speed=42.2 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 51.81s

* MA/GST Registration Certificate (GST Registration Certificate [1].pdf):
- S3 Download : 0.24s
- Azure OCR : 3.34s (Pages: 3, Avg/Page: 1.11s)
- Embedding Gen : 0.25s (Chunks: 4)
- Retrieval Time : 0.00s
- LLM Inference : 2.53s (Load: 0.00s, Prompt Eval: 2.49s)
(TTFT: 0.041s, GPU Util: 98.0%, VRAM: 21192.0/23028.0 MiB)
(Tokens: In=1020, Out=108, speed=43.4 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 6.39s

----------------------------------------------------------------------
BOTTLENECK ANALYSIS & MIGRATION RECOMMENDATIONS:
- Slowest Document : MA/UDYAM certificate (51.81s)
- Primary Bottleneck : LLM Inference (63.60s, 55.3%)

[RECOMMENDATION] Optimize LLM Inference Layer:
- If using Ollama, consider replacing it with vLLM which supports Continuous Batching and PagedAttention, increasing throughput by 3x-10x.
- If already using vLLM, optimize using tensor parallelism, model quantization (AWQ/GPTQ), or by adjusting VRAM constraints.
======================================================================
======================================================================
CREDIT-RISK LATENCY ANALYSIS & BENCHMARK REPORT: 66f9f227-40a6-4d86-a7fc-4e5d031be6cd
======================================================================
Process PID : 313476
Session Start : 2026-06-02T11:50:21.263287
End-to-End Latency : 253.47 seconds
Request Status : SUCCESS
Total Documents Processed : 15
----------------------------------------------------------------------
PIPELINE STAGE LATENCY SUMMARY:
- S3 / Local Download : 2.38s ( 0.9% of total time)
- Azure OCR Engine : 34.31s ( 13.5% of total time)
- Embedding Generation : 2.70s ( 1.1% of total time)
- Vector Retrieval : 4.06s ( 1.6% of total time)
- LLM Inference : 107.96s ( 42.6% of total time)

OTHER Global Orchestration Stages:
- Verification Pipeline Time Seconds: 0.03s
- Cibil Processing Time Seconds: 0.03s
- Final Pdf Generation Time Seconds: 0.19s
- S3 Upload Time Seconds : 0.10s
- Lsq Submission Time Seconds : 3.80s
----------------------------------------------------------------------
DETAILED PER-DOCUMENT TIMING SUMMARY:
* CA1/CIBIL (CIBIL.json):
- S3 Download : 0.12s
- Azure OCR : 0.00s (Pages: 0, Avg/Page: 0.00s)
- Embedding Gen : 0.00s (Chunks: 0)
- Retrieval Time : 0.00s
- LLM Inference : 0.00s (Load: 0.00s, Prompt Eval: 0.00s)
(Tokens: In=0, Out=0, speed=0.0 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 0.17s

* Details/Details (Activity.json):
- S3 Download : 0.11s
- Azure OCR : 0.00s (Pages: 0, Avg/Page: 0.00s)
- Embedding Gen : 0.00s (Chunks: 0)
- Retrieval Time : 0.00s
- LLM Inference : 0.00s (Load: 0.00s, Prompt Eval: 0.00s)
(Tokens: In=0, Out=0, speed=0.0 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 0.17s

* Details/Details (Lead.json):
- S3 Download : 0.03s
- Azure OCR : 0.00s (Pages: 0, Avg/Page: 0.00s)
- Embedding Gen : 0.00s (Chunks: 0)
- Retrieval Time : 0.00s
- LLM Inference : 0.00s (Load: 0.00s, Prompt Eval: 0.00s)
(Tokens: In=0, Out=0, speed=0.0 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 0.03s

* KP/CIBIL (CIBIL.json):
- S3 Download : 0.09s
- Azure OCR : 0.00s (Pages: 0, Avg/Page: 0.00s)
- Embedding Gen : 0.00s (Chunks: 0)
- Retrieval Time : 0.00s
- LLM Inference : 0.00s (Load: 0.00s, Prompt Eval: 0.00s)
(Tokens: In=0, Out=0, speed=0.0 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 0.13s

* MA/CIBIL (CIBIL.json):
- S3 Download : 0.10s
- Azure OCR : 0.00s (Pages: 0, Avg/Page: 0.00s)
- Embedding Gen : 0.00s (Chunks: 0)
- Retrieval Time : 0.00s
- LLM Inference : 0.00s (Load: 0.00s, Prompt Eval: 0.00s)
(Tokens: In=0, Out=0, speed=0.0 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 0.15s

* CA1/Address (Business Address Proof [1].pdf):
- S3 Download : 0.19s
- Azure OCR : 3.35s (Pages: 3, Avg/Page: 1.12s)
- Embedding Gen : 0.26s (Chunks: 4)
- Retrieval Time : 0.00s
- LLM Inference : 1.17s (Load: 0.00s, Prompt Eval: 1.13s)
(TTFT: 0.043s, GPU Util: 100.0%, VRAM: 20213.0/23028.0 MiB)
(Tokens: In=694, Out=50, speed=44.2 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 5.01s

* CA1/UDYAM certificate (UDYAM certificate [1].pdf):
- S3 Download : 0.21s
- Azure OCR : 3.40s (Pages: 5, Avg/Page: 0.68s)
- Embedding Gen : 0.40s (Chunks: 14)
- Retrieval Time : 3.87s
- LLM Inference : 47.48s (Load: 0.00s, Prompt Eval: 47.44s)
(TTFT: 0.042s, GPU Util: 82.0%, VRAM: 20702.0/23028.0 MiB)
(Tokens: In=1592, Out=2000, speed=42.2 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 150.41s

* CA1/GST Registration Certificate (GST Registration Certificate [1].pdf):
- S3 Download : 0.23s
- Azure OCR : 3.34s (Pages: 3, Avg/Page: 1.11s)
- Embedding Gen : 0.29s (Chunks: 4)
- Retrieval Time : 0.00s
- LLM Inference : 2.53s (Load: 0.00s, Prompt Eval: 2.49s)
(TTFT: 0.041s, GPU Util: 100.0%, VRAM: 20702.0/23028.0 MiB)
(Tokens: In=1020, Out=108, speed=43.4 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 6.43s

* KP/Aadhar Card (Aadhar Card [1].pdf):
- S3 Download : 0.20s
- Azure OCR : 3.25s (Pages: 2, Avg/Page: 1.62s)
- Embedding Gen : 0.19s (Chunks: 1)
- Retrieval Time : 0.00s
- LLM Inference : 1.97s (Load: 0.00s, Prompt Eval: 1.93s)
(TTFT: 0.042s, GPU Util: 100.0%, VRAM: 20702.0/23028.0 MiB)
(Tokens: In=1203, Out=84, speed=43.5 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 5.65s

* KP/Address (Key Person POA - Aadhar [1].pdf):
- S3 Download : 0.19s
- Azure OCR : 3.41s (Pages: 2, Avg/Page: 1.70s)
- Embedding Gen : 0.20s (Chunks: 1)
- Retrieval Time : 0.00s
- LLM Inference : 0.91s (Load: 0.00s, Prompt Eval: 0.87s)
(TTFT: 0.036s, GPU Util: 100.0%, VRAM: 20702.0/23028.0 MiB)
(Tokens: In=396, Out=39, speed=44.6 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 4.74s

* KP/Address (Key Person POA - Aadhar [2].pdf):
- S3 Download : 0.16s
- Azure OCR : 3.25s (Pages: 2, Avg/Page: 1.62s)
- Embedding Gen : 0.28s (Chunks: 5)
- Retrieval Time : 0.00s
- LLM Inference : 1.18s (Load: 0.00s, Prompt Eval: 1.14s)
(TTFT: 0.039s, GPU Util: 100.0%, VRAM: 20702.0/23028.0 MiB)
(Tokens: In=1107, Out=50, speed=43.9 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 4.91s

* KP/PAN Card (PAN Card [1].pdf):
- S3 Download : 0.17s
- Azure OCR : 3.30s (Pages: 2, Avg/Page: 1.65s)
- Embedding Gen : 0.22s (Chunks: 3)
- Retrieval Time : 0.00s
- LLM Inference : 1.55s (Load: 0.00s, Prompt Eval: 1.51s)
(TTFT: 0.041s, GPU Util: 100.0%, VRAM: 20702.0/23028.0 MiB)
(Tokens: In=1110, Out=66, speed=43.7 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 5.27s

* MA/Address (Business Address Proof [1].pdf):
- S3 Download : 0.19s
- Azure OCR : 3.26s (Pages: 3, Avg/Page: 1.09s)
- Embedding Gen : 0.23s (Chunks: 4)
- Retrieval Time : 0.00s
- LLM Inference : 1.17s (Load: 0.00s, Prompt Eval: 1.13s)
(TTFT: 0.039s, GPU Util: 100.0%, VRAM: 20702.0/23028.0 MiB)
(Tokens: In=694, Out=50, speed=44.2 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 4.89s

* MA/UDYAM certificate (UDYAM certificate [1].pdf):
- S3 Download : 0.18s
- Azure OCR : 4.50s (Pages: 5, Avg/Page: 0.90s)
- Embedding Gen : 0.42s (Chunks: 14)
- Retrieval Time : 0.20s
- LLM Inference : 47.47s (Load: 0.00s, Prompt Eval: 47.43s)
(TTFT: 0.042s, GPU Util: 100.0%, VRAM: 20702.0/23028.0 MiB)
(Tokens: In=1592, Out=2000, speed=42.2 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 52.80s

* MA/GST Registration Certificate (GST Registration Certificate [1].pdf):
- S3 Download : 0.20s
- Azure OCR : 3.26s (Pages: 3, Avg/Page: 1.09s)
- Embedding Gen : 0.22s (Chunks: 4)
- Retrieval Time : 0.00s
- LLM Inference : 2.53s (Load: 0.00s, Prompt Eval: 2.49s)
(TTFT: 0.041s, GPU Util: 100.0%, VRAM: 20702.0/23028.0 MiB)
(Tokens: In=1020, Out=108, speed=43.4 tok/s)
- JSON parsing : 0.00s
- Total Doc Time : 6.25s

----------------------------------------------------------------------
BOTTLENECK ANALYSIS & MIGRATION RECOMMENDATIONS:
- Slowest Document : CA1/UDYAM certificate (150.41s)
- Primary Bottleneck : LLM Inference (107.96s, 42.6%)

[RECOMMENDATION] Optimize LLM Inference Layer:
- If using Ollama, consider replacing it with vLLM which supports Continuous Batching and PagedAttention, increasing throughput by 3x-10x.
- If already using vLLM, optimize using tensor parallelism, model quantization (AWQ/GPTQ), or by adjusting VRAM constraints.
======================================================================
     
 
what is notes.io
 

Notes is a web-based application for online taking notes. You can take your notes and share with others people. If you like taking long notes, notes.io is designed for you. To date, over 8,000,000,000+ notes created and continuing...

With notes.io;

  • * You can take a note from anywhere and any device with internet connection.
  • * You can share the notes in social platforms (YouTube, Facebook, Twitter, instagram etc.).
  • * You can quickly share your contents without website, blog and e-mail.
  • * You don't need to create any Account to share a note. As you wish you can use quick, easy and best shortened notes with sms, websites, e-mail, or messaging services (WhatsApp, iMessage, Telegram, Signal).
  • * Notes.io has fabulous infrastructure design for a short link and allows you to share the note as an easy and understandable link.

Fast: Notes.io is built for speed and performance. You can take a notes quickly and browse your archive.

Easy: Notes.io doesn’t require installation. Just write and share note!

Short: Notes.io’s url just 8 character. You’ll get shorten link of your note when you want to share. (Ex: notes.io/q )

Free: Notes.io works for 14 years and has been free since the day it was started.


You immediately create your first note and start sharing with the ones you wish. If you want to contact us, you can use the following communication channels;


Email: [email protected]

Twitter: http://twitter.com/notesio

Instagram: http://instagram.com/notes.io

Facebook: http://facebook.com/notesio



Regards;
Notes.io Team

     
 
Shortened Note Link
 
 
Looding Image
 
     
 
Long File
 
 

For written notes was greater than 18KB Unable to shorten.

To be smaller than 18KB, please organize your notes, or sign in.