We Added AI to Three Production Products. Here's What Actually Worked.

LLMs in production are nothing like LLMs in demos. We integrated AI into Intellis ERP, CrewHRM, and BookMyDoctor — and learned hard lessons about latency, hallucination, user trust, and when AI genuinely helps versus when it's just theatre.

Intellis Idea7 April 20255 min read

Everyone is adding AI to their products in 2025–2026. The demos look impressive. The announcements get LinkedIn engagement. The reality of running AI features in production, serving real users in Bangladesh, is considerably more sobering.

We integrated AI into three products: Intellis ERP (an AI assistant for financial analysis), CrewHRM (an AI-powered performance review summariser), and BookMyDoctor (a symptom-based appointment routing feature). Here's what actually happened.

Feature 1: ERP financial analysis assistant

What we built

A natural language query interface for ERP financial data. The user types "What were our five biggest expenses last quarter and how did they compare to the same quarter last year?" and gets a structured answer with a mini chart.

What worked

The core use case — answering ad hoc financial questions that would otherwise require pulling a report, exporting to Excel, and spending 20 minutes on analysis — worked well for straightforward questions. Senior finance staff adopted it quickly. Time to insight for common questions dropped from ~25 minutes to under 60 seconds.

What didn't work

Ambiguous questions produced confident but wrong answers. "Show me underperforming products" requires defining "underperforming" — but the model would make an assumption and present it as fact without surfacing the assumption. We had one instance where a sales manager used an AI-generated analysis in a meeting without realising the comparison period was wrong.

We added explicit assumption display ("I've assumed this means sales below last year's average for the category") and a confidence indicator. This helped, but it also revealed that many users skipped reading the assumption when they were in a hurry — the exact situation where the assumption matters most.

Performance in Bangladesh

API latency to Claude and GPT-4 from Bangladesh is higher than from the US or Europe — typically 800–1,500ms for a simple query, up to 4–5 seconds for complex ones. For a feature that replaces a 25-minute process, 5 seconds is fine. But the user experience of waiting felt slow compared to the rest of the application. We added streaming responses, which helped the perceived latency significantly.

Feature 2: Performance review summariser

What we built

Managers in CrewHRM complete structured performance reviews for each direct report. The AI feature takes the structured inputs — ratings, free-text comments, goal progress — and generates a draft narrative summary for the HR record.

What worked

This was our most successful AI feature. The use case was clean: take structured data, produce structured narrative. The output quality was consistently good. Managers spent 5–8 minutes reviewing and editing the AI draft rather than 20–30 minutes writing from scratch.

Adoption was high because it solved a real pain point — managers universally find performance review documentation tedious — without adding new complexity to the workflow.

What didn't work

Two cultural issues we hadn't anticipated. First, some Bangladeshi managers were uncomfortable with AI-generated text in formal HR records — the concern was about authenticity and defensibility if the review was ever challenged. We added a clear "AI-assisted draft" watermark and editing trail.

Second: tone calibration for South Asian professional context. The AI would generate reviews with direct critical feedback phrased in ways that felt harsh in Bangladeshi professional culture, where critical feedback is typically more indirect. We added a tone parameter and made "constructive" the default.

Feature 3: Symptom-based appointment routing

What we built

When a patient books an appointment on BookMyDoctor, they can optionally describe their symptoms. The AI suggests the most relevant specialty based on the description, to help patients who don't know whether they need a cardiologist or a gastroenterologist.

What we ripped out

This feature was removed from production after six months. The problems were fundamental:

Hallucination risk in healthcare is not acceptable. The model occasionally suggested incorrect specialties with high confidence.
Liability: in Bangladesh, there's no clear regulatory framework for AI medical suggestions. Our legal advice was to not operate in this space until the framework exists.
Low adoption: patients either knew what specialty they needed or weren't sure enough to trust AI suggestions for something as important as their health.

We replaced it with a simple symptom-to-specialty lookup table — hand-curated, medically reviewed, no AI — that does 90% of what the AI version did with zero hallucination risk.

What we learned

2 of 3

AI features still in production

after 12-month review

61%

User adoption (ERP assistant)

of active finance users

22 min

Time saved per review (CrewHRM)

per performance review

1:3

Time to build vs maintain

ongoing maintenance is real work

The pattern that works: AI features that take structured input and produce structured output in a low-stakes domain. The pattern that fails: AI features that make recommendations in high-stakes domains, or that operate on ambiguous inputs with ambiguous outputs.

Work With Us

Let's build something that matters.

From ERP to HealthTech to custom SaaS — we partner with businesses that want software built properly.

Get in Touch More Articles

All articles

We Added AI to Three Production Products. Here's What Actually Worked.

Intellis Idea7 April 20255 min read

Feature 1: ERP financial analysis assistant

What we built

What worked

What didn't work

Performance in Bangladesh

Feature 2: Performance review summariser

What we built

What worked

Adoption was high because it solved a real pain point — managers universally find performance review documentation tedious — without adding new complexity to the workflow.

What didn't work

Feature 3: Symptom-based appointment routing

What we built

What we ripped out

This feature was removed from production after six months. The problems were fundamental:

Hallucination risk in healthcare is not acceptable. The model occasionally suggested incorrect specialties with high confidence.
Liability: in Bangladesh, there's no clear regulatory framework for AI medical suggestions. Our legal advice was to not operate in this space until the framework exists.
Low adoption: patients either knew what specialty they needed or weren't sure enough to trust AI suggestions for something as important as their health.

We replaced it with a simple symptom-to-specialty lookup table — hand-curated, medically reviewed, no AI — that does 90% of what the AI version did with zero hallucination risk.

What we learned

2 of 3

AI features still in production

after 12-month review

61%

User adoption (ERP assistant)

of active finance users

22 min

Time saved per review (CrewHRM)

per performance review

1:3

Time to build vs maintain

ongoing maintenance is real work

Work With Us

Let's build something that matters.

From ERP to HealthTech to custom SaaS — we partner with businesses that want software built properly.

Get in Touch More Articles