According to Forbes, data consumption increased almost 5,000% from 2010 to 2020—that translates to 59 trillion gigabytes per annum. Statista projects that number to reach a staggering 181 zettabytes in 2025. Per CIO, up to 90% of this data comes in an unstructured form, including text, necessitating various AI techniques, such as natural language processing (NLP), to extract value from this information.


Sentiment analysis, sometimes referred to as opinion mining, is a subset of NLP that performs text analysis to uncover overall attitude, emotion, opinion, and tone expressed by the author. It aims to determine whether the text conveys negative, positive, or neutral sentiment toward the subject being described.




Through leveraging sentiment analysis, management accountants can gain a competitive edge in the ever-changing world of finance by identifying emerging trends and revealing valuable insights from financial data. It empowers analysts, investors, and regulators to better understand the analysis of companies’ financial filings, social media, and blog posts as well as news coverage and customer feedback. The list of business use cases for sentiment analysis in finance is poised to grow as more data sources become available.


  • Stock market analysis. Negative sentiment of a company can be interpreted as an indicator for potential investors to stay away from its stock; similarly, specific market segments could also signal buy or sell decisions.
  • Fraud detection. Additional scrutiny over abnormal financial transactions and emerging patterns in such activities can help prevent fraud.
  • Business functions management. KPMG and Grant Thornton pioneered the field of capturing unstructured data sources to improve clients’ contract management, invoicing, accounts payable, and other processes.
  • Compliance monitoring. Targeted review of relevant data signals could flag potential violations, enabling regulators to address them sooner.
  • Risk management. Negative sentiment for a loan applicant could help banks avoid risky loans.
  • Customer service and reputation management. Tapping into voice-of-consumer analysis from social media and company reviews could help financial institutions improve their customer experience.




Text preprocessing is the first step required to format text in a consistent manner to remove unnecessary elements and to perform other tasks to help reduce noise in the data. Different techniques include the following:


  • Text cleansing: deletion of distractors, such as punctuation, URLs, HTML code, and special characters.
  • Tokenization: breaking down the text into individual words or phrases to perform sentiment analysis on individual tokens.
  • Part-of-speech tagging: a process of assigning a tag to individual words based on their grammatical function (e.g., noun, adjective, verb).
  • Stop word removal: removal of words that don’t add meaning (e.g., “the,” “a,” “in,” “and”).
  • Stemming: reduction of a word to its root/stem to treat word variations in the same way.


Feature extraction is the next step. It identifying relevant keywords, phrases, or other elements that will help predict sentiment for social listening purposes. Features of interest that could be extracted are company name and ticker, as well as sentence length, word frequency, and punctuation count.






The third step is sentiment classification, which assigns a polarity score to each feature. This score ranges from negative (-1) to neutral (0) to positive (+1). Various approaches to handle classification include the following:


1. Machine learning-based. Machine learning-based sentiment analysis automatically classifies text attributes and involves training an algorithm to identify relationships and patterns within labeled text data. Such data comes from text documents that typically were manually prelabeled with positive, neutral, or negative sentiment.


2. Lexicon-based. This methodology counts the number of positive, neutral, and negative words in a body of text and assigns a sentiment score based on intensity and frequency of such words. For example, if a news article covering a product launch by a company includes more positive words than negative ones, overall sentiment would be categorized as positive.


3. Linguistic rules-based. This popular approach provides a set of predefined, handcrafted rules and patterns to identify sentiment-bearing words. This method heavily depends on rules (distinction between good vs. not good) and word lexicons that might not apply for more nuanced analyses and texts.


4. Contextual embedding. This neural network-based approach extracts complexity and nuance of words within a given document based on iterations of unsupervised machine learning training models. This process involves deep learning algorithms to create vector representations of words and phrases in relation to surrounding text. It could then distinguish the word “bank” as being a financial institution vs. a river edge.


5. Ensemble. Ensemble sentiment analysis can extend models between different domains, such as applying a model well-trained on social listening to analyze financial reporting data and produce more accurate and reliable overall results. Various techniques to perform ensemble analysis include weighting (combining outputs based on their individual prediction performance), voting (majority vote of individual model predictions), and meta (model outputs are combined to create an input for the new master model).


The final step in the sentiment analysis process is the aggregation of sentiment scores of individual elements to determine the overall sentiment of the document. This can be accomplished via several methods, including score averages or weighted average based on feature importance. This output could take the shape of a visual graph, sentiment score, or a sentiment label. Performance of the model would then be evaluated by calculating key performance indicators including its F1 scores, accuracy, precision, and recall.

About the Authors