Are We in an AI Revolution? Part Two of Three

Welcome to part two of “Are we in an AI revolution?”. If you haven’t done so already, be sure to read Part One here.

We left off breaking down the types of players in the AI space.  Let’s now dig into the specific AI capabilities that are worth knowing about.

Categories of Capabilities

I divide AI capabilities into two major categories:  Toolkits and Applications.  The toolkit category is aimed at enabling developers to create AI capabilities.  These are your frameworks and deep learning APIs.  The application category includes AI-specific capabilities that solve specific problems.  I see these in generally two flavors.  The first is around textual understanding, or things a human can “read”. These are your natural language capabilities, bots, and entity analysis/recognition.  The second is around visual understanding, or things a human can “see”.  This is your image recognition and video recognition.  Let’s tour through some offerings in these categories from the aforementioned behemoths.  You’ll notice quickly that there are a lot of choices.

Toolkits

  • Microsoft Cognitive Toolkit – A free, easy-to-use, open-source, commercial-grade toolkit that trains deep learning algorithms to learn like the human brain.
  • Google TensorFlow – An open source software library for high performance numerical computation.
  • Facebook PyTorch – a deep learning framework for fast, flexible experimentation.
  • Amazon SageMaker – fully-managed platform that enables developers and data scientists to quickly and easily build, train, and deploy machine learning models at any scale.

These frameworks come complete with sample code and documentation aimed at getting developers to build and leverage their capabilities to power AI applications.  There is no right answer as to which one a developer would pick.  Each offering grew up in their own unique way, so my advice is to read through their history. That’s usually a good indication on where their strengths lie, and when leveraging one over the other makes sense. Your alignment with a PaaS vendor (AWS, Azure, etc.) may also drive you in a certain direction as AI capabilities are becoming core offerings for these vendors, and they are building integrations with these core frameworks that lower barriers of entry related to implementing AI technology.

Applications – Textual

  • Microsoft Cognitive Services Language – Allow your apps to process natural language with pre-built scripts, evaluate sentiment and learn how to recognize what users want.
  • Microsoft Azure Bot Service – Build, connect, deploy, and manage intelligent bots to interact naturally with your users on websites, apps, Cortana, Microsoft Teams, Skype, Slack, Facebook Messenger, and more.
  • Google Cloud Natural Language – Reveals the structure and meaning of text by offering powerful machine learning models in an easy to use REST API.
  • Google Cloud Translation – Dynamically translate most language pairs.
  • Amazon Lex – Service for building conversational interfaces into any application using voice and text.
  • Amazon Translate – Neural machine translation service that delivers fast, high-quality, and affordable language translation.
  • Microsoft Cognitive Service Knowledge – Map complex information and data in order to solve tasks such as intelligent recommendations and semantic search.
  • Amazon Comprehend – A natural language processing (NLP) service that uses machine learning to find insights and relationships in text.

Those are just a sampling of capabilities in the textual applications category. The end goal is to enable communication with a software system in the same way people interact with each other.  Users can choose voice communication, chat, email, etc.- whichever method they prefer. These services allow the system to understand the language the user communicates in, interpret the intent of the user’s request, and communicate back with the user conversationally.

Applications – Visual

The visual applications category probably gets the least amount of coverage, but I believe it is the most important.  A great framework, and a great interaction interface, will fall flat if the result is not the answer or information the user was looking for.  Each of these services is designed to review, interpret, and understand the context of non-textual data within the system. The adage of “a picture is worth a thousand words” holds true to this day, and we’re finally getting to the place where finding the picture (or video) via search is possible.  As we’ve said before, metadata is key to all of this, and these capabilities all drive the accuracy and breadth of coverage of metadata within the system.

Stay tuned for part three of this blog where I’ll provide some real-world guidance in terms of AI and search capabilities.