My Items

I'm a title. ​Click here to edit me.

VALUE JOURNEY OF CONTENT

This post will try to track the journey of a piece of content from its inception- through its worth to the business- and till its eventual death from viewership. Birth Content is born the same way manifestos come into this world- As an idea. In the writer’s mind, it incubates like the alien from the movie ‘Aliens’. The period of incubation is different in different cases, depending on the depth of concept, difficulty in implementation, or even just the necessity for further research. The idea is born out of preset parameters that are designed to bring value to a business. The more creative the route of sale, the better. Process & Route of Sale Once the idea has sufficiently broiled in the writer’s mind, he/she begins the process of putting pen to paper. Or really just fingers to keys. The first step while writing content to attract business (and find value for search engine optimization) is to make a list of keywords. How do we come up with keywords? By looking at search intent. Consider what a user might search for and how it is relevant to the idea that we are going to write about. Relevance is important here because you are up against google’s dynamic artificially intelligent engine that keeps a strict and watchful eye on how much value your piece will bring to the user searching for a particular keyword string. Once the keywords are decided upon, the next step is to write the piece. The best and most impact full way to write is in the form of a story. The basics are easy- and the best writers know this- they have abstract protagonists going through the eight-stage character development arcs while writing about the state of the financial markets. Value to business Information is the foundation of our world. Anything we want to know is just about a couple of keystrokes away. And the primary peddlers and curators of this information are search engines. So if a business wanted to sell a service or a product and wanted to let the online world know, they have to do it through the search engine. Of course, the advertisements and social media strategies can help as well, but neither compares to the traffic that search engines(mostly Google)can bring. Google has been around long enough now for users to be wise to when they are being advertised to. And most users know that businesses at the top of Google AdWords have the deepest pockets. Statistics indicate that a high organic ranking attracts 50% more click-through than the top AdWord listing. So SEO is a very important value aspect that a piece of content brings to the endeavor of growth for a business. But is that all? No. The true value of content is the scale of communication. Good content has relevance far beyond SEO and SERP rankings. It is the business'(read brand) primary vehicle of growth and development. It sits at the top of your conversational funnel, a singular effort akin to shouting out to the world from the top of a mountain- the business’ first round of engagement with any customer. Death, Revival and Sustained Value Our lives are tainted by the expectation and certainty of death. But, what of information, can it die? Like any good SEO specialist will tell you- Just look at all the content on the second page of google’s search engine results. Of course, content can die. The thing about relevance is that it changes. The context of the user changes. Their needs, their wants, and the things they search for change consistently. But does that mean all the effort a writer puts into a piece only has fleeting value? Of course not. The idea is to have a content strategy that regularly updates and republishes refurbished content. Another way to keep content alive and valuable is by constantly referencing older content in new pieces, making sure that everything is tidy and the flow of information is linear. Maybe we cannot defeat death, but we can very well make certain that we gain and retain control of the fate of the content we produce! -by Havaz Mhd

SEO: SEARCH INTENT AND ITS MANY FACES

What is Search Intent Search Intent, also known as Keyword intent or User intent, is what the user intends(or intuitively expects) to find on the search engine results page( SERP ) after they have entered a query. Let’s say you are a user, and you want to figure the quickest way to join the circus. You deploy a search on google for “Quickest way to join the circus”. It shows you many links for how to join, but most of them are long roads that will take you years. But on one website you see a quick route that would only take six months. So you stay on that page, read it, and engage with it. If enough people do this following a search for the same query, that website will soon find itself ranking #1 on the SERP. The Importance of Intent SEO IS intent. Of course keywords, credibility, and authenticity all matter on how you rank on the SERP. But google’s most primary challenge is to give its user a seamless experience. And you can bet they will use their considerable resources and high technology to achieve this. There is nothing worse for a search engine than not recognizing the intent of the user’s search. Google is an SEO specialist’s greatest friend as well as their greatest nemesis. It’s often touted expansive and ever-changing algorithm presents the truest challenge of SEO work- Matching search intent with results on the Search Engine Results Page(or SERP). There is no denying that a website’s position on the SERP will directly contribute to business volume. If your business was a house, then the content(and its resulting SEO) is the water that holds the cement together. Except your business is actually a plant- and it needs constant watering. How do we discern Intent As it turns out, 99% of all search terms fall under 4 different intent categories: Informational Navigational Commercial Transaction Data also suggests that search intent changes constantly. Nobody knows what the future of search is. Add to that google’s ever-dynamic search engine algorithm- makes SEO a tough nut. So if a business is selling a product online(which would be a commercial endeavor), the practiced logic is to construct long and short tail keywords around what words people would search for. Essentially the content creator is left to guess what keywords a searcher would use. Pen-swinging marketers from big corporations have already cornered most SERPs with extensive blog production and keyword stuffing. This coupled with the inherent uncertainty of search makes establishing a SERP presence an uphill task indeed. But it is not all doom and gloom for the creative marketer in this story. There is an inbuilt dues ex machine here- Google. As mentioned twice here already, google’s algorithms are ever-changing and use deep learning AI technology to dynamically discern the intent of a user, as well as the relevance of a website in any particular search scenario. How do you counter a deep learning algorithm that indexes your website based on search intent? You deploy your own deep learning algorithm at the production level. You reverse engineer intent using hard data. To do this, the content production process has to be refined and data-driven. Check out Instoried’s content analysis tool that uses Natural Language Processing and AI to augment your writing. Google does say that they prioritize fresh content, but that just means that you need to have a well developed content strategy that is updated regularly. But it also means that someone starting from scratch or building an online presence can have a fighting chance- As long as they figure out Intent. -by Havaz Mhd

Emotions- Key to a Successful Campaign

Emotions are one crucial element of digital advertising that supersede responsive ad formats, cross-device presence and etc. The real long-term connection between a brand and its audience is driven by emotions — meaning that brands must trigger a positive emotional reaction in a consumer to gain access to their wallets. While this concept is not necessarily new, the importance of understanding the sentiment that content on a page can provoke when placing digital ads continues to increase exponentially, and with good reason. Emotional ads outperform on almost all metrics, including profitability, and if they elicit strong feelings they are twice as likely to be shared on social media. By putting emotion at the heart of campaigns and ensuring that ads are well placed, brands can enhance engagement and forge a lasting bond with their Audience. The most successful brands have earned audience recognition and loyalty through their ability to convey a simple and powerful emotional message in advertisements from search to display and video. Coca-Cola, for example, has cut through the noise with one word: happiness. Its “Choose Happiness” campaign links the brand with a basic emotional need and empowers consumers to feel that they can achieve and spread happiness by purchasing its product. Despite the power of emotive ads, their message becomes redundant if they are not placed in the most relevant context to ensure maximum impact. For example, if a user searches for festive family activities, the most effective ads that accompany the results will be those that reflect the positive emotions associated with family bonding. The technology goes beyond simplistic keyword detection to uncover the true meaning of different words according to their context, allowing advertisers to assess not only the sentiment but also the emotional context of individual Web pages. Emotion is becoming the greatest currency in digital advertising. But no matter how emotive and creative an ad, consumers could bypass it if not placed beside content in the correct context. Understanding the emotion in ads and placing them in the most effective context is vital to creating a truly engaging and immersive advertising experience. Instoried Research Labs, a Bengaluru Based startup builds products using Artificial Intelligence which tests the effectiveness of your content and offers smart recommendations to create a smart copy which would maximize the impact of your ad campaign. Sharmin Ali, CEO of Instoried Research Labs says, "Content has come a long way since its inception. We currently live in an age where emotions have become one of the main drivers for buying behavior. My team has put together an AI enabled tool that can change the way you write and help you communicate better to your consumers than before. At Instoried, we test the effectiveness of content and help brands reach brand goals i.e. create a brand presence." Check out Instoried here. Sign up for their Demo. -by Vaibhav Venu

Effective Content Testing for a Fast Fashion Retail Brand to Drive ROI

THE PROBLEM A high -end fast fashion retailer from London was blindly increasing their annual marketing spends Y.O.Y. However, consumer views and interest on their digital media platforms was continuously falling. The Company reached out to us at Instoried to help them improve content consumption and drive ROI for multiple geographies. THE APPROACH We first tried to understand the company's basis of content creation and the corresponding content consumption rate by their target customers, across multiple content formats(text, images and videos). We found that all of their content creation was based on intuition and there was no customer -centric, data-driven approach employed to understand if their content would even stick with their audiences, in order to drive ROI. THE SOLUTION Using our content A/B testing approach, we designed a set of 5 metrics that were important for the company to understand user sentiments and reactions towards their content. The company was able to achieve a 3.5% increase in their click-through rate in less than a quarter. This drove ROI increase by 1.5X. -by Sharmin Ali

How to implement a Data-Driven Content Strategy

It is becoming increasingly obvious that in the information age we live in, data-driven businesses win. Naturally, that extends to content marketing as well. But how would it work? One very big problem that persists against a data-driven solution for content marketing is this- The writer is human, and so, inherently subjective in their writing. So it follows that content creation being a creative process, cannot fully be data-driven. But it can get pretty close to the finish line, thanks to huge advances in AI and deep learning technology. Here's a brief description of how. The first step in any Data-driven process is tracking. SME’s and enterprises churn out huge amounts of content every day in pursuit of their online marketing efforts. But how is this content performing? And why? Performance Tracking There are many tools in the market that enable performance tracking. Google Analytics being a primary example of these. A marketer needs to identify which performance metrics(KPIs) indicate success. This is dynamic and will change depending on what he/she is trying to achieve with the content. If they are writing a long-form blog post that intends to inform users/readers about a particular topic(and thereby achieve search engine optimization), a good KPI would be Time spent on-page. Click through rates(CTRs) would also indicate how much the user has engaged with the content. Conversely, if the marketer is trying to insight an emotional response from the reader, social shares would assess how effective the content is, and what kind of impact it had on the user. Here is an extensive list of metrics that Google Analytics provides to help track your content performance. Another good way for enterprises to measure performance is through conversion rates from marketing qualification to sales engagement. This works well as a measure of success for a B2B enterprise trying to establish an online presence, and strengthen its Top-of-the-funnel(TOFU) sales pipeline. So using internal KPI’s we have performance results, now, what do we compare that with? Pre- Publishing Analysis How do you measure the quality or worth of written content? Of course, there are quantitative metrics like word count, grammar, spelling, etc that measure the basic quality of writing. But these are too impersonal and removed from why a piece of content finds success. To find truly qualitative data that can be used to run comparisons, we need to quantify how words relate to each other and how people react to them. Enter Natural Language Processing or NLP At Instoried, we use NLP to measure emotion. Using five primary emotions(Joy, Anger, Sadness, Surprise, and Fear) and three tonal metrics(Positive, Negative, and Neutral) as the primary indices that digest a whole page of content into numbers that can be compared against each other or against preset benchmarks. Here’s a series of blogs on how we did this. Writing Augmentation: Instoried's tool analyzes the words used in the content and recommends contextually aware changes that affect the emotional valences, and thereby final performance. Comparative Analysis So now let's look at the data that can be available to us. Market Performance/Results (eg: Time Spent, Duration of visit, CTR, Bounce Rate) Pre-publishing Emotional Analysis (Joy, Anger, Fear, etc) Now we have both sides of the coin and this unlocks a coveted tool in business - Predictability . Using historical and industry-specific competitive data, we compare Market Performance against the emotional analysis and analyze and digest what emotional combination gives what results. And also arrive at benchmarks that can be repeated and scaled(by predicting performance) through change to the content(either manually, or by virtue of machine-learning enabled writing augmentation). Instoried uses an AI model that also tries to figure out why, i.e, how do these emotions affect market performance? But that is a very technical question, with a very technical answer and it is for another day. For our purpose here, we use the correlations between the two to scale and repeat our best performing content- Optimizing for our internal KPIs and maximizing business value for creative effort! -by Havaz Mohammed

Emotion Recognition and Analysis for Marketing Content

BACKGROUND ABC is a large FMCG company based out of Singapore. Growing at an extremely fast rate and increasing their content spends on a monthly basis, the company seeks to establish a strategy to increase their brand recall and customer’s purchase intent across their product portfolios. ABC wanted to find innovative ways to create better marketing content. ASSESSMENT As it works the company ABC used to analyze their content production according to its performance in the market(post-publishing). This made constructive feedback loops a nightmare because the written word did not have quantifiable metrics, meaning that the company had to rely on the individual, and hence, subjective, judgment of writers. This was unacceptable for this company because they wanted a data oriented approach, and put a lot of stock in their ability to apply and process their data. CHALLENGES FACED During the preliminary assessment, our team identified the following challenges at company ABC: Problem 1: First of all, their content creation was outsourced to a content agency. The process of creating content was based on intuition and not backed by any data. Problem 2: There is no technology available that can measure the emotional quotient of the content and help the marketer validate if their content will strike a rapport with their consumers. Problem 3: Thirdly, many of the metrics collected internally at the organization were inaccurate, which gave room for human errors, and other inefficiencies. Fig: Important Metrics for ABC CTR: Number of people who view an ad, actually end up clicking on it. CTR for ABC varied from 0.05% to 1% for Google Display Network and Search Network respectively which are relatively low as compared with industry standards. Second important metric is Bounce Rate. Bounce Rate for ABC was at 75% and Stay at 25%. SOLUTION Instoried’s emotional intelligence opens up the doors to quantify writing as a probability distribution (or a composition) of 5 primary emotions- Joy, Anger, Sadness, Fear and Surprise, each its own comparative data point. Including our sentiment analysis, this gives us unique pre-publishing metrics to analyse performance. Coupling this with the capability of our recommendation engine to suggest changes in the writing to influence the composition, allowed Company ABC to increase their CTR and decrease their Bounce Rate significantly. Fig: Process followed by Instoried RESULT As a result of the analysis performed and content created using our tool, company ABC witnessed an increase in CTR to 0.47% and 2.18% for Google Display Network and Search Network respectively in 3 months. New Bounce Rate for ABC dropped to 40% and stay increased to 60%. Fig: x- axis: Time, y-axis: CTR; Google Display Network Fig: x- axis: Time, y-axis: CTR; Search Network Fig: Earlier Fig: After - by Sharmin Ali & Havaz Mohammad

Instoried Tech - There is a Secret Sauce!

Introduction In Part-II of our ongoing discussion, regarding the technological capacity building at Instoried, we will be focusing on how we accomplished the goal of per- forming sentiment analysis and spellcheck for text written in Hindi. We will also be introducing a novel “readability” analyzer, which predicts how hard a particular sentence is to read, for an average reader. Finally, as stated in Part-I of this series, we will be mentioning some open source libraries our NLP team is working on and is under rapid development. Credit: Rashmi Ghosh. Dataset As with all things Deep Learning, our initial focus was on creating a large, context-relevant and diverse enough dataset. This is absolutely essential for Language Modelling (LM), wherein we train a model to understand the semantics, syntax, context, etc of the underlying language. We also created another tagged dataset for the classification task, i.e. sentiment and emotional analysis. The dataset for the LM task was scraped from a variety of books written in Hindi on a multitude of topics - history, politics, science, etc. The size of the dataset was ≈ 9.7 GB. The dataset for the Classification task was scraped from a number of sources including, but in no way limited to - movie reviews, news articles, twitter, product descriptions, etc. We finally tagged the ≈ 60,000 data-points for both the sentiment and emotion. Pre-processing Once the datasets have been built, they now need to be “cleaned.” Apart from removing stop words and special characters, we also wanted to ensure that there were no transliterated text in the dataset. This was done to ensure that all the data-points had only Hindi content. On achieving that, we focused our attention on NER and POS Tagging. After a lot of trials and tribulations, we were finally able to manage satisfactory results using a modified FLAIR multi-lingual model. This little breakthrough would prove to be instrumental in tackling the spellcheck functionality, as we were now able to efficiently handle the class of nouns. Finally, we experimented with a number of tokenizers - Punkt, TreeBankWord, SpaCy, etc - to find the one most suitable to handle vernacular (i.e. indic) languages. We found SentencePiece to be the most suitable candidate. Now that we had these pieces of the puzzle in place, it was time to fire up those GPUs! Language Modelling Since there are no pre-trained Hindi language models available online with a “permissive” (i.e. MIT, Apache) license and which have been trained on a large enough dataset, we decided to train our own. After having weighed the pros and cons of various language modelling architectures - BERT, GPT-2, XL-Net - we decided to continue with GPT. Some of the features which went in its favor are: (i) ability of generate embeddings from the trained language model, which would be useful in classification tasks; and, (ii) amenability of the model to trimming / pruning, so as to reduce inference time. Now, training ∼ 9GB of data, even after parameter optimization, is very computationally intensive. But once we started testing the final trained model, the exercise proved to be well worth the time (and money) spent! The results generated showed a very intricate understanding of the syntax and semantics of the Hindi language, and seemed to lack any bias. The perplexity score of the model was 37. Certainly a lot of scope for improvement, but once we had our Language Model in place, it was time to deploy it for the classification task. Classification There are two classification tasks which are of utmost importance to us - (i) Sentiment Analysis, i.e. positive, negative and neutral; and, (ii) Emotional Analysis, i.e. joy, surprise, anger, fear, sad. Continuing our desire to ensure uniformity of models / architectures across the different modules, we went about the task of training the GPT model on the tagged dataset, for the purposes on sentiment analysis. As before, we used Stochastic Gradient Descent with Nestrov Momentum. To ensure faster convergence, we used the 1-cycle policy, complimenting it with the results of LR-Finder for the optimal learning rate. Our final model had an F1-score of 0.81 and the AUC score was 0.77. We are continuously striving to improve these metrics. And in-order to find the words which have a high correlation to a particular emotion, we experimented with the CAPTUM Library in PyTorch. That resulted in us being able to properly interpret our models, thereby ensuring a higher (self-) confidence in our results. Readability Now, for the exciting part; what we believe to be a marked contribution for the progress of vernacular NLP - the Readability Analyzer! Using TF-IDF on a down-sampled version of the large dataset, having a size of 500MB, we plotted a (word-) frequency graph. This helped us create a table of words linked to their general usage (i.e. how (in-) frequent its occurrence is over a large enough dataset). This was used as a metric, along with others, to determine how hard it is to read a particular sentence, thereby helping a writer gauge the reading level required to understand their content. Finally, the same sampled dataset was used to create a corpus of unique words, which would facilitate our implementation of the spellchecker. These words were stored in a highly optimized dictionary, to enable deployment in real-time scenarios. Also, we were able to provide recommendations for misspelt words owing to the embeddings generated in-house whilst conducting the GPT-based language modelling. Don’t you just love it when such disparate things just fall so beautifully in to place! Conclusion So, as can be seen, we have made a lot of progress with respect to objectively understanding the Hindi language. This is a very positive first step in building a comprehensive tool for vernacular languages. As for the next steps, our readers can expect two major updates in the short-term for our Hindi tool - (i) emotional analysis of content, along with highlighting of words contributing to a particular emotion; and, (ii) basic grammar check, then subsequently providing a subset of possible corrections. For the English tool, two major updates in the mid-term our readers can look forward to - (i) added automation to assist users make optimized decisions with regards to the recommendations; and, (ii) an in-house developed speech-to-text analyzer, for any audio format uploaded on our tool. We’re very excited for what the future holds. In conclusion, I would like to mention two GitHub Repos which our NLP Team makes use of while doing their tasks: DVC-Download: which helps practitioners keep a track of their datasets (and other large files) using DVC and S3 buckets. Classification-Report: which helps you visualize the models weights, biases, gradients, and losses, during the training process. Multi-label Classification: which is a compendium of notebooks that guides users on working with datasets with multiple tags. Till our next blog, keep writing! And let the emotions flow. - by Sutanshu Raj #Hindi #Vernacular #DeepLearning #NLP #Readability #Sentiment #Marketing

Instoried Tech - Is there a Secret Sauce?

Background What are some of the biggest hurdles that Natural Language Processing / Understanding (NLP/U) practitioners, especially in the industry, have to deal with? Till a year-or-so back, it used to be the lack of progress in terms of robust deep-learning architectures, as compared to its cousin field of Computer Vision, which could carry out complicated and nuanced classification tasks (such as Emotional Analysis and Sentiment Analysis). With the advent of Transformers, and models like BERT, GPT-2, XL-Net, etc. built upon the aforementioned architecture, their now exists a plethora of models to choose from, no matter what task you wish to accomplish. But these models brought with them their own set of challenges, mainly - deployment. Now, how to put such deep models in to production, and still have a response time of < 700ms? Luckily for us, this question too has been answered over the last few months, with researchers developing methods to distill and / or prune these huge NLP models, so as to make them amenable for practical purposes. Despite all this progress, there is still a potential for improvement, both in terms of accuracy metrics and in reducing the space complexity of these models without affecting their performances. This is the story I want to talk about. Dataset We at Instoried, around 6 months back, took our first concrete step towards tackling the specific problem of accurately calculating the Emotional- and Sentiment- valence scores for written text, along with providing contextually-aware recommendations (and not just synonyms / antonyms) for words and phrases which the model predicts are maximally correlated to a particular emotion. To this end, we started off with collecting data … obviously. This tends to be a laborious task, as any data engineer would recognize, but is also the most important as without a thorough understanding of the data, its distribution, its t-SNE plots, etc., one can never be sure about any biases or errors which may have crept in the dataset. For the purposes of tagging, we decided to narrow down on the following subsets: for sentiment - positive, negative, neutral - and for emotion - joy, surprise, anger, fear, sad - based upon certain principles of neuro-marketing. We were lucky enough to find a few publicly available datasets, though not tagged completely, but the majority of our dataset (i.e., ≈ 0.75 million data-points) was collated using scraping code written in-house followed by data augmentation. After the customary cleaning-up and pre-processing of the dataset, it was now time for the interesting part of our job. Training the NLP models! Baseline We started off by testing some traditional Machine Learning (ML) based methods, in-order to establish a baseline. Irrespective of the result of the baseline, the accuracy could only get better from there. As is usual, we began with our good ‘ol friend - logistic regression. Generally overlooked in the last few years, especially since the resurgence of Deep Learning (DL) methods, it still is a robust method and gives very good results (sometimes even better than more complex methods). We tested a number of tokenizers for fitting our training data - TF-IDF, SentencePiece, Punkt - whilst using SVMs for the classification tasks. This gave us a fair idea of how they performed with respect to our dataset. Then, we started trying a slightly more advanced classification method - LSTM. The Bi-directional variant of LSTMs, along with the Attention mechanism, gave us some initial promising results. Attention proved useful in understanding the correlation between words and emotions. But we wanted to dig “deeper.” Training & Results A small drawback of LSTMs was its inability to handle mid-range dependencies. As a subscriber of the adage - One Model to Rule them All - we were determined to find that one model which would do all our classification tasks, along with handling the syntax and semantics of sentences. In walks … BERT. For training purposes, we used Stochastic Gradient Descent with Nestrov Momentum. And to ensure faster convergence, we used the 1-cycle policy. As the model was going to be deployed from a single K80 GPU machine and we wanted to ensure seamless, real-time results, we explored many optimization techniques - (i) data parallelization of both the input and the output of the model; and (ii) further utilizing DistilBERT for compression which led to lesser parameters and similar results. This proved to be a fun engineering challenge, and the end result was something the entire team was happy with. We decided to put that model in to production for the v1.0 of our tool / product. To further improve the model classification, we used Label Smoothening with KL Divergence Loss. Finally, the F1-score of our algorithm for the classification tasks is 0.95 and the AUC score is 0.933, thereby ensuring quality results. Future Work At this point, it would be imperative for me to point out the significant help we have taken from open-source (OSS) libraries / codes / blogs, so as to build upon them to achieve our goals. In the same spirit, we too are working out concrete ways by which we can give back to the community. I shall be sharing those with you soon. As you may have noticed, this blog focused primarily on our English language tool and the NLP research aspect of our work. In an upcoming blog, I will give you an overview of the Hindi tool and what it takes to deploy DL models on the cloud. So do stay tuned. Till then, let the emotions flow! Cheers. - by Sutanshu Raj #Instoried #NLP #DeepLearning #TechBlogs #Startups

Instoried

Follow us on 

Privacy Policy