Ankur A. Patel

OK
About Ankur A. Patel
Ankur A. Patel is an AI entrepreneur, advisor, and author. He is currently the cofounder and head of data at Glean. Glean uses natural language processing to deliver vendor spend intelligence within an accounts payable solution.
Previously, Ankur was the vice president of data science at 7Park Data, a Vista Equity Partners portfolio company. Ankur used alternative data to build data products for hedge funds and developed a natural language processing-based entity recognition, resolution, and linking platform for enterprise clients. Prior to 7Park Data, Ankur led data science efforts in New York City for Israeli artificial intelligence firm ThetaRay, a pioneer in applied unsupervised learning.
Ankur began his career as an analyst at J.P. Morgan and then became the lead emerging markets sovereign credit trader for Bridgewater Associates, the world's largest global macro hedge fund. He later founded and managed R-Squared Macro, a machine learning-based hedge fund.
He is the author of Hands-on Unsupervised Learning Using Python and Applied Natural Language Processing in the Enterprise, both O’Reilly Media publications.
A graduate of the Woodrow Wilson School at Princeton University, Ankur is the recipient of the Lieutenant John A. Larkin Memorial Prize. He currently resides in New York City.
To subscribe to Ankur’s AI newsletter, please visit ankursnewsletter.com.
Customers Also Bought Items By
Are you an author?
Author Updates
-
-
Blog postDeepfakes are fake videos, audio clips, and text — generated by AI — that are deceptively real. Here is the latest on the technology.Key Takeaways If you have only a few minutes to spare, here’s what you should know.
Deepfakes are AI-generated media pieces (audio, video, images, or text) that look incredibly realistic and closely imitate a living personality or incident. Because the technology behind them has advanced dramatically, they can often be pretty hard to spot.
Th1 month ago Read more -
Blog postKey Takeaways If you have only a couple of minutes to spare, here is what you should know about deepfakes:
Deepfakes are AI-generated media pieces (audio, video, images, or text) that look incredibly realistic and closely imitate a living personality or incident. Because the technology behind them has advanced dramatically, they can often be pretty hard to spot.
The technology behind deepfakes started being developed in the late 1990s. Significant improvements were made in1 month ago Read more -
Blog postTikTok has over 1 billion monthly active users, rivaling competitors such as Instagram and YouTube. TikTok's artificial intelligence algorithm is key to its breakout success.Key Takeaways If you have only a few minutes to spare, here’s what you should know.
TikTok uses a best-in-class AI-based recommendation algorithm to drive user engagement. This algorithm is better than that of its competitors such as YouTube and Instagram.
TikTok does not require the user to declare explicit2 months ago Read more -
-
Blog postIf you have just a few minutes to spare, here’s what you need to know:
TikTok uses a best-in-class AI-based recommendation algorithm to drive user engagement. This algorithm is better than that of its competitors such as YouTube and Instagram.
TikTok does not require the user to declare explicit interests. Users have the option to declare interests, but, even if they do not, users will find that the app quickly starts recommending the right videos within a short time after they2 months ago Read more -
Blog postGPT-3 is the best known text generation model on the market today but is expensive to use. Open source alternatives are now available. Here is a detailed comparison of GPT-3 and its competition.Key Takeaways If you have only a few minutes to spare, here’s what you should know.
OpenAI has four GPT-3 model versions: Ada, Babbage, Curie, and Davinci. Ada is the smallest and cheapest to use model but performs worst, while Davinci is the largest, most expensive, and best performing of the s2 months ago Read more -
Blog postGuided by a few text-based commands from humans, DALL-E 2 not only generates but also edits digital images. Here is how it works and what it means for the future of art and design. OpenAI’s initial release of the AI text-to-image generator DALL-E in January 2021 piqued the world’s interest but was quite limited. The generated images had low resolution and at times failed to capture the text commands from humans. Open AI's second and latest release of the same technology earlier this month (ca2 months ago Read more
-
Blog postKey Takeaways If you have only a few minutes to spare, here’s what you should know.
OpenAI has four GPT-3 model versions: Ada, Babbage, Curie, and Davinci. Ada is the smallest and cheapest to use model but performs worst, while Davinci is the largest, most expensive, and best performing of the set.
GPT-3 Davinci is the best performing model on the market today. It has been trained on more data and with more parameters than its open source alternatives, GPT-Neo and GPT-J.
2 months ago Read more -
Blog postOpenAI’s initial release of the AI text-to-image generator DALL-E in January 2021 piqued the world’s interest but was quite limited. The generated images had low resolution and at times failed to capture the text commands from humans. Open AI's second and latest release of the same technology earlier this month (called DALL-E 2) addresses these shortcomings and has received rave reviews so far.
DALL-E 2 is a transformer-based language model that takes text commands from users to create2 months ago Read more -
Blog postThis is the third issue of our series, NLP in Action. This series highlights the good and the bad of common methods of natural language processing. In doing so, we hope to spark conversation and curiosity in the world of NLP.
“If the architecture is any good, a person who looks and listens will feel its good effects without noticing.” - Carlo Scarpa Although the 20th century Italian architect uttered this in reference to physical architecture, something can be said for the expansiveness9 months ago Read more -
Blog postThis is the second issue of our series, NLP in Action. This series highlights the good and the bad of common methods of natural language processing. In doing so, we hope to spark conversation and curiosity in the world of NLP.
Named entity recognition (NER) is a natural language processing tool used for information extraction. Information extraction is defined as retrieving structured information from an unstructured text source.
Also known as entity extraction, identification,10 months ago Read more -
Blog postThis is the first issue of our series, NLP in Action. This series highlights the good and the bad of common methods of natural language processing. In doing so, we hope to spark conversation and curiosity in the world of NLP.
Computers have made working with numerical data a breeze. Anyone with internet access can perform complex functions or create a graph simply by selecting a table of numbers.
This is not the same for textual data. Analyzing text in the past was only kn11 months ago Read more -
Blog postI'm super excited to announce the launch of my latest book, Applied Natural Language Processing in the Enterprise! Huge shoutout to my co-author, Ajay, who has worked tirelessly with me on this for more than a year. Our book is a technical book on natural language processing (NLP), a subfield of machine learning and artificial intelligence that deals with human (aka natural) language.
For decades, people have used computers and calculators to crunch numbers, avoiding mental math a1 year ago Read more -
Blog postI'm super excited to announce the launch of my latest book, Applied Natural Language Processing in the Enterprise! Huge shoutout to my co-author, Ajay, who has worked tirelessly with me on this for more than a year. Our book is a technical book on natural language processing (NLP), a subfield o
1 year ago Read more -
Blog postWelcome to Ankur’s Newsletter by me, Ankur A. Patel. AI/ML startup founder and author of Applied Natural Language Processing in the Enterprise and Hands-on Unsupervised Learning using Python.
Sign up now so you don’t miss the first issue.
Subscribe now
In the meantime, tell your friends!
1 year ago Read more -
Blog postBuild Once, Sell Many Times
Software as a Service (SaaS) companies have come to dominate the technology space, and there is one metric that sums up just why: gross margin. Gross margin is the difference between the company’s revenue and
2 years ago Read more -
Blog postBuild Once, Sell Many Times
Software as a Service (SaaS) companies have come to dominate the technology space, and there is one metric that sums up just why: gross margin. Gross margin is the difference between the company’s revenue and its cost of goods sold. The higher the gross margin, the more of the revenue the company retains for each and every sale it makes. SaaS companies have very high gross margins - typically 50% on the low end and as much as 80% or more on the high end.
2 years ago Read more -
Blog postAI is not a binary event
Mainstream media continues to position artificial intelligence as a step function event. According to the media, we are in the pre-AI era today. Then, one day in the future, artificial intelligence will finally a
2 years ago Read more -
Blog postAI is not a binary event
Mainstream media continues to position artificial intelligence as a step function event. According to the media, we are in the pre-AI era today. Then, one day in the future, artificial intelligence will finally arrive, and many of us will lose our jobs overnight. Talk of artificial intelligence stealing jobs overnight may make for good press, but that’s not how technological progress works in the real world.
In industrial automation (e.g., robotics used2 years ago Read more -
Blog postMakers in artificial intelligence today obsess over having the best performing model; they obsess over developing the model that has state of the art performance on a well-curated dataset, and they focus too little on how that model will perform in production or whether the model fits well into the end-user’s existing workflow.
If you have the “best-performing” model in an isolated training and testing environment but fail to integrate well into the end-user’s existing workflow and fai3 years ago Read more -
Blog postMakers in artificial intelligence today obsess over having the best performing model; they obsess over developing the model that has state of the art performance on a well-curated dataset, and they focus too little on how that model will perform in production or whether the model fits well into the end-user’s existing workflow.
If you have the “best-performing” model in an isolated training and testing environment but fail to integrate well into the end-user’s existing workflow and fai3 years ago Read more -
Blog postArtificial intelligence is fueled by three major components - data, algorithms, and compute.
In this article, let’s explore data, including questions such as:
What is data?
What are some of the applications of data across an organization?
Who are the end users of your data?
How do you get started?
The Economist, The New York Times, and Wired all refer to data as the oil of the digital era.
According to AI luminaries such as Andrew Ng, the co-fo3 years ago Read more -
Blog postArtificial intelligence is fueled by three major components - data, algorithms, and compute.
In this article, let’s explore data, including questions such as:
What is data?
What are some of the applications of data across an organization?
Who are the end users of your data?
How do you get started?
The Economist, The New York Times, and Wired all refer to data as the oil of the digital era.
According to AI luminaries such as Andrew Ng, the co-fo3 years ago Read more -
Blog postHello readers —
Starting this week, I will provide an editorial on an important topic in artificial intelligence every week instead of my usual curated list of AI news. Readers have asked for this type of editorial to help them better understand how the latest developments in AI will affect them, both in business and in their personal lives. My goal is to inform, to inspire, and to elicit good conversation on all things AI with this editorial and would love your feedback along the way3 years ago Read more -
Blog postHello readers —
Starting this week, I will provide an editorial on an important topic in artificial intelligence every week instead of my usual curated list of AI news. Readers have asked for this type of editorial to help them better understand how the latest developments in AI will affect them, both in business and in their personal lives. My goal is to inform, to inspire, and to elicit good conversation on all things AI with this editorial and would love your feedback along the way.3 years ago Read more -
Blog postHere are five important AI stories from the week.
How to Build an AI Factory (ZDNet)
In their soon-to-be-released book, Competing in the Age of AI, Harvard University professors Marco Iansiti and Karim Lakhani discuss the four most critical ingredients to building an AI-first company.
Invest in a well-functioning data pipeline to clean and integrate data.
Develop or modify algorithms for your tasks, drawing upon supervised, unsupervised, and/or reinforcement learni3 years ago Read more -
Blog postHere are five important AI stories from the week.
How to Build an AI Factory (ZDNet)
In their soon-to-be-released book, Competing in the Age of AI, Harvard University professors Marco Iansiti and Karim Lakhani discuss the four most critical ingredients to building an AI-first company.
Invest in a well-functioning data pipeline to clean and integrate data.
Develop or modify algorithms for your tasks, drawing upon supervised, unsupervised, and/or reinforcement learni3 years ago Read more -
Blog postHere are five important AI stories from the week.
RoBERTa, the Next Best Advance in NLP (Facebook)
Since Google launched BERT late last year, there have been several improvements along the way such as OpenAI’s GPT-2 and XLNet. Facebook has launched its own improvement, RoBERTa, which produced state-of-the-art results on the most popular NLP benchmark known as GLUE. To build this, Facebook made some adjustments to Google’s BERT architecture but also trained on more data and for l3 years ago Read more -
Blog postHere are five important AI stories from the week.
RoBERTa, the Next Best Advance in NLP (Facebook)
Since Google launched BERT late last year, there have been several improvements along the way such as OpenAI’s GPT-2 and XLNet. Facebook has launched its own improvement, RoBERTa, which produced state-of-the-art results on the most popular NLP benchmark known as GLUE. To build this, Facebook made some adjustments to Google’s BERT architecture but also trained on more data and for l3 years ago Read more -
Blog postHere are five important AI stories from the week.
Microsoft Bets Big on Artificial General Intelligence (The New York Times)
Microsoft invests $1 billion in the A.I. research lab founded by Elon Musk and headed now by the former head of Y Combinator Sam Altman. OpenAI’s successes to date include releasing a very impressive language model called GPT-2; OpenAI made headlines earlier this year when it chose NOT to release the code because it feared releasing the code would lead bad3 years ago Read more -
Blog postHere are five important AI stories from the week.
Microsoft Bets Big on Artificial General Intelligence (The New York Times)
Microsoft invests $1 billion in the A.I. research lab founded by Elon Musk and headed now by the former head of Y Combinator Sam Altman. OpenAI’s successes to date include releasing a very impressive language model called GPT-2; OpenAI made headlines earlier this year when it chose NOT to release the code because it feared releasing the code would lead bad3 years ago Read more -
Blog postHere are five important AI stories from the week.
The Future of AI Is Unsupervised (MIT Technology Review)
Today’s machine learning applications need a lot of labeled data to have good performance, but most of the world’s data is not labeled. For machine learning to advance, algorithms will need to learn from unlabeled data and make sense of the world from pure observation, much like how children learn to operate in the real world after birth without too much guidance.
Ac3 years ago Read more -
Blog postHere are five important AI stories from the week.
The Future of AI Is Unsupervised (MIT Technology Review)
Today’s machine learning applications need a lot of labeled data to have good performance, but most of the world’s data is not labeled. For machine learning to advance, algorithms will need to learn from unlabeled data and make sense of the world from pure observation, much like how children learn to operate in the real world after birth without too much guidance.
Ac3 years ago Read more -
Blog postHere are five important AI stories from the week.
AI Conquers Multi-Player Poker (Facebook)
Facebook and Carnegie Mellon have built the first AI bot that is capable of winning multi-player poker (no-limit Hold’em), building upon the single-player success that Libratus demonstrated last year. Unlike chess and Go, poker is a game with hidden information - the AI cannot see the cards that are held by its opponents. This makes poker incredibly complex and challenging for an AI to wi3 years ago Read more -
Blog postHere are five important AI stories from the week.
AI Conquers Multi-Player Poker (Facebook)
Facebook and Carnegie Mellon have built the first AI bot that is capable of winning multi-player poker (no-limit Hold’em), building upon the single-player success that Libratus demonstrated last year. Unlike chess and Go, poker is a game with hidden information - the AI cannot see the cards that are held by its opponents. This makes poker incredibly complex and challenging for an AI to wi3 years ago Read more -
Blog postHere are five important AI stories from the week.
Europe Needs to Shape Up (European Council on Foreign Relations)
The U.S. and China are leading the AI race. Other players, such as Russia and India, are tier-two contenders. To compete with these AI powers, Europe has a lot of work to do. First, it needs to retain AI talent, which it so often loses to the U.S. Europe does not pay competitive enough salaries. Second, very stringent data privacy regulations are hampering access to3 years ago Read more -
Blog postHere are five important AI stories from the week.
Europe Needs to Shape Up (European Council on Foreign Relations)
The U.S. and China are leading the AI race. Other players, such as Russia and India, are tier-two contenders. To compete with these AI powers, Europe has a lot of work to do. First, it needs to retain AI talent, which it so often loses to the U.S. Europe does not pay competitive enough salaries. Second, very stringent data privacy regulations are hampering access to3 years ago Read more -
Blog postHere are five important AI stories from the week.
AI to Measure Work Performance (The New York Times)
AI provides workers such as customer service representatives real-time feedback on how to be better customer service representatives, nudging them to be more cheery, to slow down or speed up, and more. In some cases, the AI decides which workers are the worst performers and, therefore, should be let go. AI to empower and surveil employees is on the rise.
The Ubiquity of F3 years ago Read more -
Blog postHere are five important AI stories from the week.
An AI Film by WIRED (WIRED)
WIRED explores how AI is now becoming ubiquitous in everyday life around us in this 42-minute-long film. In just a few short years, AI has gone from an academic curiosity to something we cannot live without. I highly recommend watching this film.
An AI Film by Fortune (Fortune)
Budgets for AI are on the rise. This 12-minute-long film by Fortune explains just how AI transitioned from a nic3 years ago Read more -
Blog postHere are five important AI stories from the week.
AI to Fight Fake News (Allen Institute for AI)
With the major breakthroughs in the field of natural language processing over the past 18 months, it is much easier than ever before to generate fake—yet seemingly real—news. To study and detect fake news, the Allen Institute developed a model that both generates fake news and detects whether an article was written by its AI—called Grover—or a human. I highly recommend you check out3 years ago Read more -
Blog postHere are five important AI stories from the week.
A People-First AI Strategy (Wharton)
To have the most success implementing AI in enterprise, people must feel empowered by AI, not threatened. People are the most important asset at most companies, and AI should support them. If AI remains a blackbox and is viewed as a substitute of humans rather than a complement to them, there will be substantial resistance to AI, limiting its adoption. We need a human-centric AI strategy to im3 years ago Read more
Titles By Ankur A. Patel
Many industry experts consider unsupervised learning the next frontier in artificial intelligence, one that may hold the key to general artificial intelligence. Since the majority of the world's data is unlabeled, conventional supervised learning cannot be applied. Unsupervised learning, on the other hand, can be applied to unlabeled datasets to discover meaningful patterns buried deep in the data, patterns that may be near impossible for humans to uncover.
Author Ankur Patel shows you how to apply unsupervised learning using two simple, production-ready Python frameworks: Scikit-learn and TensorFlow using Keras. With code and hands-on examples, data scientists will identify difficult-to-find patterns in data and gain deeper business insight, detect anomalies, perform automatic feature engineering and selection, and generate synthetic datasets. All you need is programming and some machine learning experience to get started.
- Compare the strengths and weaknesses of the different machine learning approaches: supervised, unsupervised, and reinforcement learning
- Set up and manage machine learning projects end-to-end
- Build an anomaly detection system to catch credit card fraud
- Clusters users into distinct and homogeneous groups
- Perform semisupervised learning
- Develop movie recommender systems using restricted Boltzmann machines
- Generate synthetic images using generative adversarial networks
NLP has exploded in popularity over the last few years. But while Google, Facebook, OpenAI, and others continue to release larger language models, many teams still struggle with building NLP applications that live up to the hype. This hands-on guide helps you get up to speed on the latest and most promising trends in NLP.
With a basic understanding of machine learning and some Python experience, you'll learn how to build, train, and deploy models for real-world applications in your organization. Authors Ankur Patel and Ajay Uppili Arasanipalai guide you through the process using code and examples that highlight the best practices in modern NLP.
- Use state-of-the-art NLP models such as BERT and GPT-3 to solve NLP tasks such as named entity recognition, text classification, semantic search, and reading comprehension
- Train NLP models with performance comparable or superior to that of out-of-the-box systems
- Learn about Transformer architecture and modern tricks like transfer learning that have taken the NLP world by storm
- Become familiar with the tools of the trade, including spaCy, Hugging Face, and fast.ai
- Build core parts of the NLP pipeline--including tokenizers, embeddings, and language models--from scratch using Python and PyTorch
- Take your models out of Jupyter notebooks and learn how to deploy, monitor, and maintain them in production
Unsupervised Learning könnte der Schlüssel zu einer umfassenderen künstlichen Intelligenz sein
Voller praktischer Techniken für die Arbeit mit ungelabelten Daten, verständlich geschrieben und mit unkomplizierten Python-Beispielen
Verwendet Scikit-learn, TensorFlow und Keras
Ein Großteil der weltweit verfügbaren Daten ist ungelabelt. Auf diese nicht klassifizierten Daten lassen sich die Techniken des Supervised Learning, die im Machine Learning viel genutzt werden, nicht anwenden. Dagegen kann das Unsupervised Learning - auch unüberwachtes Lernen genannt - für ungelabelte Datensätze eingesetzt werden, um aussagekräftige Muster zu entdecken, die tief in den Daten verborgen sind – Muster, die für den Menschen fast unmöglich zu entdecken sind.
Wie Data Scientists Unsupervised Learning für ihre Daten nutzen können, zeigt Ankur Patel in diesem Buch anhand konkreter Beispiele, die sich schnell und effektiv umsetzen lassen. Sie erfahren, wie Sie schwer zu findende Muster in Daten herausarbeiten und dadurch z.B. tiefere Einblicke in Geschäftsprozesse gewinnen. Sie lernen auch, wie Sie Anomalien erkennen, automatisches Feature Engineering durchführen oder synthetische Datensätze generieren.