Case studies

Skaylink uses AI to analyze large quantities of data for IMWF

Mining for meaning with AI

IMWF Institute for Management and Economic Research has offered its customers comprehensive media, market, and reputational analyses since 2013. To do so, the institute gathers data from more than 438 million websites on tens of thousands of companies and brands worldwide. In a single night, IMWF analyzes more data than 1,000 individuals could in an entire year. To process this extremely high quantity of data, the institute sought an experienced IT partner.

“The collaboration with Skaylink has been vital for our company’s success. The experts at Skaylink understand our business model. They prepare the data for our analysis quickly and cost-effectively.”

To do so, it is important that the IT partner fully understands what the analysis demands.

Natural language as a challenge

IMWF needs more than just numerical analysis to provide customers with meaningful insights into their company and products. The institute acquires in-depth insights into individual issues from web sources such as social media posts, customer reviews and news stories.

Analyzing natural language brings a lot of challenges to the table. They include:

  • Sentiment: Do the reports about the company tend to be more positive, neutral, or negative?
    Affected companies interpret the term “cyber incident” as negative. For cyber insurance, it is a neutral term.
  • Current events: Have current events caused the meaning of words to change?
    During the COVID-19 crisis, for example, the way we understood the term “reduced working hours” changed. And terms like “number of cases” in the context of companies became more neutral than they were before the pandemic.
  • Context: Does a text contain irony, for example?
    Language is used differently in a tweet than it is in a press release for an annual report.

IMWF also offers its customers analyses that could include all core EU languages, for example. To overcome these and many other challenges in the automated analysis of large quantities of data, Skaylink developed a comprehensive automated process for IMWF.

Brief summary of the project

Company:

IMWF GmbH

Challenge:

Quick and cost-effective analysis of large quantities of data in natural language

Solution:

Combination of rule- and AI-based analysis methods

Technologies used:

Terraform, Docker, AWS ECS, EC2 Spot Instances, AWS Glue, Lambda, Athena, StepFunctions and state-of-the-art models such as BERT, Claude (Anthropic), GPT (OpenAI)

Rule-based data cleansing

Skaylink receives the raw data from website crawling from an IMWF partner and enriches it with data it has crawled itself. Based on AWS technologies, Skaylink processes about 2.5 million articles in a wide range of languages in a short amount of time each night, detects relevant texts, and filters them by company, brands, topics and special events to ensure that they are ready in the morning. The data undergoes a sophisticated automated process – a combination of rule-based models and AI methods.

Skaylink uses an internal database with a web interface (an entity app) to manage the more than 30,000 relevant companies, brands and topics to be monitored.

Terraform defines the cloud infrastructure used every day as infrastructure as code (IaC). It’s connected to Docker containers, whose code is transferred to the AWS Cloud via a GitHub connection to AWS services such as CodePipeline and CodeBuild. This allows the existing system to be quickly adapted and scaled accordingly. Within the cloud, AWS solutions such as Elastic Container Service (ECS), EC2 Spot Instances, SQS, AWS Lambda, StepFunctions, S3, Kinesis, Glue, Athena and others provide the framework for the entire pipeline.

Quick analysis adjustments thanks to AI

Skaylink uses AI for data analysis: Using rule-based AI methods, the experts detect grammatical structures and classify texts based on word lists and complex sequence patterns. They also use large language models (LLMs). They include basic LLM models such as BERT, which Skaylink refines with annotated data from IMWF, and tailors to the context of the application. Special analyses are occasionally enriched with the latest models like Claude 3 from Anthropic, which can be used via AWS Bedrock. This combination of rule-based and model-based approaches makes it possible to balance the individual strengths and weaknesses of the respective technologies. It also provides more options for intervening in particular cases. The ability to adapt the latest LLMs and incorporate industry-specific information gives IMWF the precision and individualization that are essential for its studies.

If there are changes, the experts can retrain the BERT AI models in half a day. The more data they have, the better the result. With a completely new model design, this would take several weeks. The Skaylink experts regularly check the AI results on a random basis to ensure the best possible quality. They can be further optimized if necessary. This human factor, also known as “human-in-the-loop“, ensures on the one hand that the existing classifications are correct. It also ensures that new topics are detected at an early stage and changes in meaning are taken into account accordingly.

At the end of processing, Skaylink transfers only the data relevant for analysis to IMWF. This allows IMWF to create reports for customers quickly.

Alongside the quality and speed of the analysis, the Skaylink experts pay attention to the costs of the AWS infrastructure and tap into optimization potential here as well.

Ongoing improvement

The analysis experts at IMWF and Skaylink remain in constant contact to optimize the models further. New IMWF customer projects or current results could be a reason for this. Thanks to their extensive experience, Skaylink’s experts can often make minor adjustments within a few hours to take a closer look at a current scandal, for example.

New technical developments also offer additional optimization potential. “IMWF trusts us to carry out experiments. This means we can continually check whether new technologies offer added value for IMWF,” said Agatha Dabrowski, Senior Consultant AWS Analytics, Skaylink. “This is how we learn together with IMWF.”

Conclusion

“Together with the experts at Skaylink, we constantly improve our analysis methods,” Carola Klaus explained. “I look forward to continuing our long-standing, trusting and successful collaboration.”

Successful projects from Skaylink