Skip to content
BA InsightBA Insight
  • Request a Demo
  • Request a Demo
  • Products
    • BA Insight for Amazon Kendra
    • BA Insight for Amazon OpenSearch Service
    • BA Insight for Azure Cognitive Search
    • BA Insight for Elasticsearch
    • BA Insight for Microsoft 365
    • BA Insight for Microsoft Teams
    • BA Insight for NetDocuments
    • BA Insight for ServiceNow
    • BA Insight for Solr
    • WorkHub Search
  • Technology
    • SmartHub
    • Connectors
    • AutoClassifier
    • AppBus
  • Solutions
    • Enterprise Search
    • Customer Portal Search
    • Website Search
    • Search for Legal
    • Search for Life Sciences
    • Clinical Trials
  • Resources
    • Podcasts
    • Videos
    • Case Studies
    • White Papers
    • Product Resources
    • Newsletters
    • Articles
    • Webinars
    • Blog
    • News
  • About Us
    • Company Overview
    • Partners
    • Contact Us
Blog

Automating Text Analytics

Posted on May 21, 2018June 24, 2020 by Mike Gregory - DIRECTOR OF SYSTEMS ENGINEERING

We’ve all read the stats about how much of the content currently being generated and managed within organizations is unstructured, and we’ve worried about the opportunities that might be missed if we are not able to leverage the knowledge this content contains. Indexing the content with your favorite search engine is a start, but it doesn’t go nearly far enough.

The content also needs to be tagged with metadata, which enables things like context-based personalized delivery, dynamic boosting of results, and what I like to call “search and refine” – a usage pattern that is ubiquitous on internet sites but often lacking in intranet search experiences. The challenge, of course, is that the users who create the content will rarely, if ever, tag content consistently.

To solve this problem, an automated approach is needed. Auto-tagging capabilities fall into one of two categories: Rules-based or Machine Learning /AI (statistics-based). Taking a rules-based approach has the advantage of being easy to understand and implement, but it requires human input to work. The ML / AI approach, on the other hand, largely automates the process and can “learn” over time, but it can be difficult to understand and debug when things don’t go smoothly. So the question is, which approach is better?

Neither.

Both methods have their place, and at BA Insight we advocate combining the two approaches to auto-tagging and applying each in ways that take advantage of their strengths. Let’s take a closer look at each method:

Rules-based

When a clear set of defined metadata tags exist, in my experience a rules-based approach yields the clearest, easiest to understand mechanism to have those tags automatically applied to your content. There are several concepts that apply here:

  • Taxonomy creation. The first step is to determine which metadata would be of the most help to users in finding the content they need. This is a good place to apply a statistical methodology to analyze your content and suggest tags based on that analysis.  Rather than having your most knowledgeable users pore through thousands of documents looking for terms which appear frequently, BA Insight offers an automated analysis capability that does this more quickly and accurately than any human could. Take advantage of your users to verify that the suggested terms make sense, but automate the heavy lifting.
  • Rules. Now that you have a taxonomy of terms, the most accurate way to control assignment of these terms to content is via Rules. Sure, there is some work here, but in the long run using Rules to assign known terms ensures that it’s done right. Applying your understanding of the content to the creation of Rules enables accurate identification of the concepts or categories which apply to each piece of unstructured content, which in turn becomes invaluable when users search for that content.  It is critical that the rules be sufficiently flexible to allow you to apply only the tags that accurately describe what a document is truly about rather than just what it may mention. In other words, you need to be able to evaluate potential tags to decide if they truly reflect what the document is about. In addition to a widely extensible rules development interface, our products incorporate mechanisms to enable you to test your rules before using them to actually tag any content.
  • Patterns. There may be cases where you need to apply concepts like security classification to your content. This is an example of where the human touch brings a lot of value, because it requires an understanding of what “sensitive” means in your organization. There are obvious examples – seeing that a document contains Social Security Numbers or Birth Dates come to mind. But there are also things like Part Numbers which may appear to be innocuous but in the context of your organization represent potential risk if they are widely disseminated.   Identifying documents that may contain such information enables you to determine how you want to handle those documents – kick off a review workflow, exclude them from search results – once you’ve identified potential problems, then you can decide what to do.

Entity Extraction

Entity Extraction is a tagging technique which involves identifying “entities”, which are typically names of something, and tagging the document with the entities it contains.  This can also include things like product names or project names – anything that can be identified by context and pattern is a candidate for Entity Extraction. There are several ways to do this:

  • List. When you have a finite number of potential values such as product or customer names, this technique makes sense. You just provide a list of potential values to our software, and any entities found within documents will be automatically extracted and applied as tags.
  • Entity Recognition. This is a technique whereby Machine Learning models are “trained” to be able to identify certain types of entities such as names of people, places, projects, or products based on context and other factors, without a defined list as a reference. Then, when entity names that match what the models have been trained to identify appear within the content, they are extracted as tags.  It’s important to be able to differentiate between entities with similar names and correctly identify Newton the scientist vs. Newton the city in Massachusetts. We also provide the ability to train our software to identify new entities which are important to your business so that you are not limited to ones we define.

Cognitive Analysis

“Cognitive” is the hottest buzzword in the search market, and for good reason. The technologies that fall under this umbrella have great promise and can revolutionize the way people find information.  At BA Insight, we are taking a very different approach than many other players in the market, i.e. open technology. Rather than try to create our own set of cognitive services and force customers to do this “our way”, we are integrating with the cognitive suites of products offered by companies like Google, Microsoft, and Amazon. In the same way that we enable our customers to choose the search platform which best suits their needs and use our portfolio of products to enhance that capabilities of that search platform, we enable our customers to decide which vendor of cognitive services is best for them.

The names are different, but there are several capabilities which are common across cognitive suites which we have or will incorporate into our software portfolio:

  • Image and Video Analysis. Up to this point I’ve talked about unstructured content as if it’s all just text, but that’s not the case.  There is valuable content in binary formats such as images and video which should be leveraged as well. This is another area where ML can shine. Cognitive Search suites provide the capability to analyze an image and give you back text which describes what is in the picture, and it can extract any text which is visible in the image. If the image appears within a document, then the text which describes the image can be added to the other text within the document. When that document is analyzed using the methods described above, all content within the document can be considered, improving the accuracy of the tags that are applied.
  • Sentiment Analysis. Machine learning models can go beyond the content or meaning of a document to get to the “tone” to determine if the document is written in a positive or negative manner. This kind of understanding can be used to determine what content best meets a request. For example, you may want to search for content with a negative tone to edit and improve, or for documents with a positive tone to deliver to customers. This technique is applicable to certain specific use cases such as Customer Service, but it’s broadly useful for Enterprise Search.
  • Natural Language Understanding. The objective of applying Natural Language Understanding is to get to the “intent” within analyzed text. It can be applied both as part of content classification and search.  For example, understanding of the intent of the content of a document can enable things like assigning a document type such as Documentation, Procedure, Policy, etc. When users are searching for documents using natural language, an understanding of the way in which they ask the question can increase the relevance of search results by ensuring the right types of documents are returned first.

Bottom line:

Analyzing and tagging unstructured content for findability is not a one-size-fits-all task.  It is best to apply a combination of human understanding and automated capabilities to apply metadata to the ever-growing body of content so that it can be easily found and utilized by the people who need it most within your organization. It’s also important to note that there may be cases where multiple mechanisms are needed to correctly classify your content. For example, a rule may look for a combination of an extracted entity such as a person’s name and a date in close proximity and deem that combination to be sensitive.

This entry was posted in Blog and tagged AutoClassification, Cognitive Search, Metadata.
The Art of Listening
Slowly We Turn, Step by Step…Time to Talk Facebook
Author Profiles | Blog Home
Tags
  • Artificial Intelligence
  • AutoClassification
  • AzureBA Insight
  • Cloud SSA
  • Cognitive Search
  • Connectors
  • Content
  • Customer Portals
  • DelveDocumentum
  • Dynamics
  • Elasticsearch
  • Enterprise Search
  • European SharePoint Conference
  • Expertise Locator
  • Federated Search
  • Google Search Appliance
  • Hybrid SharePoint
  • InfoApps
  • Information Strategy
  • InfoSites
  • Intelligent Search
  • Knowledge Integration Platform
  •  Knowledge Management
  • Legal
  • Machine Learning
  • Metadata
  • Office 365
  • Office Graph
  • Personalization
  •  Portals and Intranets
  •  SearchFirstMigration
  •  Security
  • SEO
  • SharePoint
  • SharePoint 2010
  • SharePoint 2013
  • SharePoint 2016
  • SharePoint Search
  •  Smart Analytics
  • Taxonomies
  • Unified Information Access
  • Visual Refiners
  •  Webinar

BA Insight logo 2022AI-driven intelligent enterprise search software

Askable knowledge

Reward questions with results.

BA Insight Headquarters
7 Liberty Square, Suite 3
Boston, MA 02109-5812, USA
+1.339.368.7234
sales@BAinsight.com

BA Insight UK Office
London, St James
4th Floor, Rex House
4 – 12 Regent Street
London, SW1Y 4PE

BA Insight Romania Office
C.A Rosetti Street No 17
Office 009ResCo-work01
District 2 Bucharest 020011

Find out how internet-like search can be implemented inside your organization

  • Enterprise Search
  • Search for Legal
  • Search for Life Sciences
  • Elasticsearch
  • SharePoint Search
  • Search Orchestration
  • Enterprise System Connectors
  • Autoclassification
  • Expert Locator
  • Search for Dynamics
  • Search for Salesforce
  • Podcasts
  • Webinars
  • Videos
  • White Papers
  • Product Resources
  • Privacy/Cookie Policy
BA Insight © 2023
  • Products
    • BA Insight for Amazon Kendra
    • BA Insight for Amazon OpenSearch Service
    • BA Insight for Azure Cognitive Search
    • BA Insight for Elasticsearch
    • BA Insight for Microsoft 365
    • BA Insight for Microsoft Teams
    • BA Insight for NetDocuments
    • BA Insight for ServiceNow
    • BA Insight for Solr
    • WorkHub Search
  • Technology
    • SmartHub
    • Connectors
    • AutoClassifier
    • AppBus
  • Solutions
    • Enterprise Search
    • Customer Portal Search
    • Website Search
    • Search for Legal
    • Search for Life Sciences
    • Clinical Trials
  • Resources
    • Podcasts
    • Videos
    • Case Studies
    • White Papers
    • Product Resources
    • Newsletters
    • Articles
    • Webinars
    • Blog
    • News
  • About Us
    • Company Overview
    • Partners
    • Contact Us

Perspectives from our CEO,
Massood Zarrabian

Organizations have long been struggling with how to make knowledge assets available to employees, partners, and customers. Although there have been major technological advances in how this information is captured and made available over the past two decades, these have mostly been around a single business process. For example, when I joined Servicesoft, we were pioneering the idea of using search and classification engines to help transfer information to customers as part of the emerging eService market. At OutStart, we initiated the idea of objectifying learning and making the development of learning component-based so that it could be consumed via a “just-in-time” model. This meant that learners interacted only with learning objects that helped them increase their knowledge, and therefore their value, to the organization. In the last decade, another silo of information capture has emerged in the form of Social Business Software, making it easier than ever to capture nuggets of knowledge that can help others.  The interesting dynamic in all of this is that there are large investments being made to capture information and knowledge assets, but very little of it is actually accessible by employees or customers.

In the ‘90s, multiple software vendors tried to address the issue of information access by providing software to implement portals, whether they were used for helpdesk integration, customer support, intranets, or R&D to help with the collaboration and reuse of IP.  As the technology evolved and search engines and appliances became available, the portals were replaced with implementations of enterprise search. The problem with this approach is that it views the search engine as a ‘one size fits all’ solution, as opposed to viewing it as an enabling technology that helps organizations address a business issue. Failed enterprise search projects became the norm as companies tried, and continue trying, to resolve their information access challenges by implementing new search appliances while they still have underlying issues around integration with other systems, classification and tagging, and a sub-par user experience.

I joined BA Insight because I saw an immense opportunity to transform the way enterprise search is being implemented.  I am sure you know that the amount of content that is being generated is growing exponentially across an increasing number of sources, and the inability to find the information assets our people need is costing us billions of dollars in lost productivity while leading to low morale and customer dissatisfaction.

When I evaluated the opportunity at BA Insight, I found the company to be uniquely positioned to provide a new approach to the unification of information that stops the pattern of enterprise search failures.  We do this by transforming SharePoint, which has become ubiquitous across enterprises, into a unified information access platform that enables fast implementation of search-driven applications at a fraction of the cost of other options while de-risking search projects.

There are many notable and successful search-driven applications available on the Consumer Internet. Many are ‘killer apps’, and as a class they have fundamentally changed the way people interact with information.  However, in comparison, corporate Intranets, customer support portals, helpdesk applications or knowledge management solutions don’t come close to being killer apps, nor do they provide a remarkable user experience.

I found it intriguing that BA Insight has the technology, people, and partners to help catapult enterprise search to the next level.  We replace a people-intensive, SI-oriented approach to implementation with a technology-based approach that is lower cost, lower risk, easier to implement, and easier to upgrade. We do the heavy lifting to make our products work with different versions of SharePoint as well as various versions of other software applications that exist within a customer’s infrastructure.  We automate how content is tagged to improve findability and also provide out of the box capabilities to improve the user experience with how information is found and accessed.

I believe that in order for a company to be wildly successful, it must demonstrate the following five important attributes:

First is customer-centricity. Customers put their trust in startups with a vision, and we must partner with them to make sure they succeed. I am so proud of all of the customers I have had the honor of serving, as well as their achievements, and I consider it the cornerstone of my past success and BA Insight’s future success.

Second is our team. Our people have the experience and the desire to help our customers implement and deploy incredibly powerful applications. We want to help our customers build killer apps, as opposed to search portals. This is a team of high energy, committed, customer-centric experts who have embraced the idea that search-driven applications could be a lot better and are working to change how they get implemented.  I am fortunate enough to have become part of the BA Insight team and am so proud of everyone who works here. In a short period of time, we have made incredible progress on many fronts and I feel strongly that everyone’s loyalty and belief in the company will continue to provide an incredible depth for us and help our customers transform how they work with search technology.

Third is the technology. We have a broad set of capabilities that extend the value of existing SharePoint investments, so we eliminate the need to change the underlying infrastructure, saving a lot of time and money.  We have over 90 out of the box connectors, a world class auto-tagging engine for content within and outside of SharePoint, visual refiners to let users drill down and find information quickly, an active workspace with content assembly, and document preview capabilities.  We also provide a lot of flexibility in how our technology can be implemented.  Our full platform, for example, is particularly well-suited for new projects.  On the other hand, if search-driven applications have already been implemented within an organization, then components of our platform can be leveraged to augment the existing technology to improve results.

Fourth is to be financially responsible. Company success is about investment and return on that investment, and it needs to be measured in two fronts:

    1. ROI, which is the pure quantitative and financial analysis, often measured as productivity gains.
    2. My preferred approach is return on value (ROV), which is qualitative and focused on items such as customer loyalty, employee morale, better decision-making, and the ability to find information faster to increase customer responsiveness. Many of these things impact productivity and are therefore ROI, but they aren’t often measured and are therefore difficult to quantify. Isn’t the impact of finding the right knowledge to be able to do one’s job priceless? If we are smart about how we invest our time, money, and energy based on return on value, then we will naturally bring tremendous value to our customers and market and end up as a growing, profitable company that is resilient through the ups and downs of the economy and market.

And fifth is promoting a strong work/life balance. Work is important, but we cannot forget our families and shouldn’t compromise being with them due to office pressures.  For a long period of time I was a workaholic, but after my first son was born I committed to change. Over the years I have done my best to practice what I preach, successfully balancing work and family life, and I encourage our employees to do the same. I enjoy nothing more than spending time with my incredible wife and our great boys.

These attributes are prevalent at BA Insight, and I am very enthusiastic about the opportunities that lie ahead.  I am extremely confident that we are well positioned to enable the future of unified information access as we help visionary organizations around the world realize the value of the collective intelligence that is being captured every day within the massive volumes of content they produce.

Join the BA Insight Mailing List