PROJECT REQUIREMENTS

Image Annotation Services for 250K+ Monthly Retail Promotions

The client expanded their previous data management engagement with SunTec Data, enlisting a separate specialized team for consistent, high-volume annotation. Each month, the team processed PDF documents from HTML email campaigns—identifying and categorizing retail promotional blocks, then enriching them with structured metadata. These annotations powered the client's Relative Promotional Value (RPV) metric for benchmarking reports.

The project scope is as follows:

Identify and annotate distinct retail promotional blocks within PDF documents derived from HTML email campaigns.
Draw precise bounding box annotation around each identified element using the client’s proprietary data annotation platform.
Classify each block into the correct retail category — Entertainment, Food Services, Health & Beauty, Clothing & Accessories, and others — through structured category classification.
Apply metadata tagging to each annotated element, capturing promotional value, brand name, and parent company name.
Execute brand-entity attribution to link every annotated block to the correct advertiser and parent organization.
Maintain a monthly delivery volume of over 250,000 annotations across all batches.

PROJECT CHALLENGES

High-Volume Retail Image Annotation across Inconsistent Formats and Ambiguous Categories

To meet these high-quality image-labeling requirements, our team had to address three key challenges that could compromise the value of the client's platform.

Classifying Promotions when Category Boundaries Blurred

Retail advertising rarely confines itself to a single product category. A pharmacy chain might run one ad block featuring both prescription services and personal care products. A big-box retailer might advertise tools, clothing, and groceries in a single visual unit. The client's taxonomy required one classification per block, which meant annotators had to determine — not just identify — which category the block primarily represented. There was no image-labeling shortcut for this; it required the consistent application of merchandising logic that had to be learned and sustained across the full team.

Processing PDFs with no Predictable Visual Structure

Source HTML emails came from dozens of organizations, each with its own formatting conventions. Once converted to PDF, these became annotation targets with little structural consistency — varying page layouts, degraded font rendering, inconsistent content block boundaries, and varying resolutions from sender to sender. The team could not build processing efficiency around visual templates as a conventional image annotation project would allow. Every batch effectively required re-orientation before annotation could proceed at full pace.

Building a Reliable Brand Registry for Unfamiliar Promoters

Regional retailers and newer consumer brands frequently appeared in promotional content without clear parent company information. The client's competitive benchmarking was organized at the parent company level, which meant accurate brand-entity attribution was non-negotiable for the integrity of every delivered output. Annotators could not simply tag what was visible in the ad — they had to confirm what was not visible through targeted external research. This brand-entity validation layer added time and specialized skills to every batch that included unfamiliar brands.

OUR SOLUTION

Precise Image Annotation Services and SME-Led Governance

Our delivery approach was built around the expectation of long-term, high-volume performance — not just initial compliance with the project brief. A dedicated 23-person team handled all image labeling work, with roles spanning execution, quality review, domain expertise, and project coordination. From the first batch, the team operated within a structured workflow that addressed format variability, classification consistency, and metadata accuracy as separate workstreams, each with its own governance layer.

Intensive Platform Onboarding before First Delivery

Team training preceded any production work. Annotators learned the client's proprietary annotation tool through hands-on sessions covering bounding box techniques, metadata tagging field requirements, retail category definitions, and the downstream role each annotation plays in the RPV calculation. A shared annotation guideline document, co-created with the client, anchored every decision made throughout the engagement.

Pre-Annotation Batch Assessment Protocol

Each incoming batch was assessed by a senior annotator before the wider team began work. The assessment identified format irregularities specific to that batch — including inconsistent page structures, degraded image rendering, or overlapping content blocks — and generated a format briefing that the team used to calibrate their approach. Annotators dealing with batch types they had not previously encountered could also consult this maintained library of visual reference sheets for recurring format patterns.

Systematic Category Identification and Precise Bounding Box Placement

Annotators processed each PDF in sequence, identifying promotional blocks, determining their retail category, and precisely placing bounding boxes. Classification decisions were made based on both the visual content and any accompanying text in the ad block, with annotators applying a consistent set of category classification rules developed during onboarding and refined through the engagement's weekly feedback cycles.

Metadata Enrichment and Brand-Parent Verification

Each completed bounding box received a structured metadata set containing entities like promotional value, brand name, and parent company. An internal brand-parent reference database was actively maintained throughout the project to provide consistent lookup results for brands appearing repeatedly. For new or unfamiliar brands, annotators conducted external web research to confirm parent company attribution before tagging — protecting the integrity of the client's brand-entity attribution data and the competitive analysis built on it.

SME-Led Resolution of Ambiguous Classification Decisions

When an advertisement or promotional copy could not be clearly classified under the existing guidelines, the case was not left to individual judgment. It was escalated to a senior annotator or team lead for review. These domain subject matter experts assessed flagged items against the client-agreed rules and made the final classification decision. If ambiguity remained even after this review, a client discussion was scheduled to resolve the issue and align on the correct interpretation.

Reference-Led Labeling Governance to Prevent Annotator Drift

Every escalated edge-case decision was documented and added to a shared precedent and reference library, giving the entire team an evolving decision base to rely on. Weekly calibration calls with the client served as the preventive control, ensuring that ambiguous cases, new category rules, and framework changes were resolved centrally and communicated to all annotators simultaneously.

Three-Tier QA Designed to Ensure Annotation Consistency

We implemented batch-level quality control to ensure high labeling accuracy. Each annotator performed a self-review before submission to catch obvious errors. A random sample from every batch was then reviewed by a peer annotator to validate category classification and metadata accuracy. Finally, senior QA analysts reviewed each delivery batch for bounding box precision, metadata completeness, and adherence to the client’s annotation guidelines.

Proactive Prevention of Annotator Drift

Because this annotation engagement ran for 36+ consecutive months with a 23-member team, consistency in the annotated datasets and across annotators had to be actively maintained. Over time, even trained annotators can begin interpreting rules differently, creating annotator drift that can weaken downstream model performance. To control that risk, we established a governance layer with an evolving reference library, calibration calls with the client, and a QA process, maintaining annotation accuracy at a stable 98.5%.

Project Outcomes

Sustained 250K+ Monthly Annotations with 98.5% Accuracy over Three Years

When the annotation engagement began, the brief was straightforward: deliver 250,000 annotated records per month at an accuracy level high enough to support competitive intelligence reporting. Over three years, the team not only met that brief — it sustained performance long enough to make the arrangement a permanent part of the client's operational infrastructure.

250K+ Annotations Delivered per Month Across 36+ consecutive cycles without delays or backlog

50% Reduction in Report Turnaround Time Structured annotation outputs minimized post-processing on the client’s end

98.5% Annotation Accuracy Maintained Through a governance model that kept all 23 annotators aligned throughout the engagement

Retail Image Annotation Services — Delivered at 98.5% Labeling Accuracy — for a Competitive Intelligence SaaS

A Trusted Competitive Intelligence Firm Tracking Brand Promotions