Aladdin private markets


How BlackRock Engineers gave NLP a human touch

By Sunil Dalal, Head of Alternatives Engineering within the Aladdin Product Group at BlackRock

For investors searching for needles of data, amid haystacks of documents—natural language processing has been a boon. Machines can now scan tens of thousands of written documents. But the benefit has not come cheaply: teams of analysts manually training models, niche solutions that cannot scale to other areas, and solutions that can read text, but have trouble finding meaning amid tables and graphs. However, innovative engineers at BlackRock have pioneered a revolutionary solution via a model that reads more like a human than a machine. This new model brings efficiency, scale, and endless potential to the world of unstructured financial data. In the following Q&A, Sunil Dalal, Managing Director at BlackRock, and Head of Alternatives Engineering within the Aladdin Product Group, describes the materials, methods, and impacts of BlackRock’s particular flavor of NLP.

1. What challenge do asset owners and asset managers face that this technology solves?

To be ahead of the market, both asset owners and asset managers need to be able to process information with precision and at scale. As such, they rely on data-driven insights to generate alpha, create reports, comply with regulations, perform due diligence and unlock opportunities. But getting those insights from multiple sources of unstructured data presents a tremendous operational burden. Put simply, asset managers and asset owners need help processing large quantities of quarterly reports (and other documents).

In fact, the challenge of extracting unstructured data—from sources like PDFs—is not unique to financial services. And there has been growth in Natural Language Processing (NLP) solutions to meet these needs, across a variety of different business problems and industries.

But we knew that there were gaps—when it came to industry players’ needs.

So, we conducted a thorough research exercise within and beyond financial services to uncover what NLP solutions our clients were using (homing in on those that extracted data points from documents). While we found solutions that can capture meaning from text with up to 90% accuracy, there was a real shortfall when it came to processing tables and charts. That’s because NLP is contextually sensitive. For instance, when words appear in a sentence, the machines read the words on either side to determine what they mean. But when you use words as labels on charts, or headers on tables, then accuracy drops significantly.

Here, we discovered one of the biggest gaps for one of our biggest client-user bases.

In asset management, a lot of very important data appears in tables and charts. We recognized an opportunity to design an NLP capability that can read tables, charts and text, in any combination. This represented a huge step forward, and it had never been done before—both among the solutions we found in our client research and across the industry.

2. How does this new NLP technology work?

Before we describe how it works, it’s important to describe how it came to be. Within the BlackRock Hackathon (BlackRock’s annual employee competition to drive new technology inventions and innovations), we explored a way for the contents of documents (like PDFs) to be thoroughly indexed and searchable. And we also endeavored for the relationships between datapoints to be searchable as well. To do both, we needed a model that could “think” (or consider the contents) in more than one dimension.

Most existing NLP technology scans words in a very linear, sequential way (aka in a single dimension). It looks at every word in relation to the words on either side and then uses that context to discern a semantic meaning.

It’s a different process for the human brain, when compared to most NLP tech.

When it comes to charts and tables, humans actually read in two-dimensions. We see bars and tables with numbers in relation to each other, and the words that appear as labels like “Return” or “Performance” appear alone. Not in a sentence. Existing models can be trained to evaluate these tables, but each model must be fine-tuned for specific tasks and document layouts—with each needing to be told what text corresponds to which numbers (and for every type of document)—leading to an onerous process that can be very difficult to scale.

To solve this problem and better extract the meaning from these tables, we developed a model that mimics the way humans read. Subsequently, this model interprets the data as it appears in those non-linear dimensions. (To make this possible, we used a graph transformer network model, which considers a much richer set of spatial semantics than the simple sequences in sentences.)

By making this leap from a sequential model to a model that understands in the two dimensions of the page, our NLP model can process any document the same way. And it doesn’t require extensive training and onerous tweaking for every document type.

3. How is this innovative technology different from other technologies that are designed to perform automated data extraction from business documents?

What makes this technology unique is its wide range of applications. Many vendors in this space start with a very specific business problem and then train their models to view documents through that specific lens. But from an efficiency standpoint, that’s actually backwards. They’ve imprisoned their technology to that exact use case—and have to start from scratch every time they want to solve a new use case.

By contrast, our technology is use-case (and context) agnostic. We can run it on any type of document, and it immediately begins to decipher what each word, and number, means in relation to other numbers, charts, tables, etc., similar to the way a human brain would. It’s much better at discerning which information is useful and which is unnecessary. Once it ingests all that data, the user can apply it to any existing workflow designed for a specific business purpose. It’s a one-size fits all data extraction.

Clients still can design their own processing methods to interpret the data, but the problem of having to hand tune the model to extract the data has fundamentally been eliminated.

4. How does this technology enable efficiency in the investment process?

This new NLP technology creates efficiencies in two significant ways.

First, it functions like a “Google for PDFs” by providing a single, low-level search function across a universe of documents. Investors who receive regular inflows of data—for example, quarterly reports from private market managers—can now easily scale their operations by analyzing these reports within a specified framework at a set cadence.

The second way is how it enables the exploration of unstructured documents for investment purposes. Our model makes it possible to discover information relevant to both due diligence and alpha generation, without the need to first calibrate the model according to a specific business need.

At BlackRock, we’re already using this technology as the backend to eFront Insight, an analytical platform for private markets that offers quarterly report data services to clients. Bringing this unique technology to eFront Insight radically improved the way we automatically extract more than 200 datapoints per every unstructured document. (In fact, we are now able to extract 96% of the required reference data at 99% accuracy across tables and paragraphs.)

With this innovative deployment, we are also now in the process of building a due-diligence search platform in partnership with the BlackRock Alternative Investors (BAI). And we are planning to extend this as a service to Aladdin clients in the future. Additionally, we are working on proof-of-concept projects with other businesses at BlackRock to see how we can apply our technology to support other, unique data workflows.

Ultimately, this technology not only enables us to assist our clients with their workflows, but also demonstrates the way we approach any challenge—great and small.

Get in touch to learn more about Aladdin

Please try again
First Name *
Please enter a valid first name
Last Name *
Please enter a valid last name
Business Email *
Please enter a valid email
Company Name *
This field is mandatory
Country *
This field is mandatory
State *
This field is mandatory
Phone Number *
This field is mandatory
Organization Type *
This field is mandatory
Primary Role *
This field is mandatory
Request Type *
This field is mandatory
Interested In
Thank you
Thank you!
Your information has been successfully submitted.