Ai – Multi-Modal WhatsApp Conversational Agent (Voice, Vision & PDF)

Key Features:

Voice-to-Voice Interaction: The agent transcribes incoming voice notes and can respond back with a high-quality AI-generated voice, making it accessible and hands-free.
Visual Intelligence: Powered by GPT-4o-mini, the bot can “see” and describe images, identify objects, and answer specific questions about photos sent by the user.
Instant Document Processing: Automatically extracts and summarizes text from PDF documents, allowing for quick information retrieval without human intervention.
Short-Term Memory: Remembers the last 10 interactions in a conversation window, ensuring the AI maintains context and doesn’t ask repetitive questions.

Medical Clinics & Laboratories: To handle voice-recorded symptoms from patients, read digital prescriptions (PDF), or analyze reports.
Automobile & Industrial Repair Shops: For technicians who need to send photos of damaged parts for instant identification or troubleshooting.
HR & Recruitment Agencies: To automate the screening of resumes (PDFs) and handle initial voice-based candidate inquiries.
E-commerce Retailers: To allow customers to send photos of products they are looking for or send voice notes for orders.

WhatsApp Business API Credentials: Access to their Meta Developer App or a phone number for the WhatsApp Cloud API.
OpenAI API Key: To power the vision, voice, and text intelligence (or we can provide it as part of the monthly fee).
Business Instruction Document: A clear description of the bot’s role, FAQ list, and preferred tone of voice (Formal vs. Friendly).
Escalation Contact: A specific phone number or department name to mention if the AI cannot resolve a query.
PDF Knowledge Base: Any specific manuals, price lists, or brochures they want the AI to “read” and remember.