OCI Language
Translating Text using OCI AI Services
I’ve written several blog posts on using various features and functions of the OCI AI Services, and the most recent of these have been about some of the Language features. In this blog post, I’ll show you how to use the OCI Language Translation service.
As with the previous posts, there is some initial configuration and setup for your computer to access the OCI cloud services. Check out my previous posts on this. The following examples assume you have that configuration setup.
The OCI Translation service can translate text into over 30 different languages, with more being added over time.
There are 3 things needed to use the Translation Service. The Text to be translated, what language that text is in and what language you’d like the text translated into. Sounds simple. Well, it kind of is, but some care is needed to ensure it all works smoothly.
Let’s start with the basic setup of importing libraries, reading the config file and initialising the OCI AI Client.
import oci
from oci.config import from_file
#Read in config file - this is needed for connecting to the OCI AI Services
#COMPARTMENT_ID = "ocid1.tenancy.oc1..aaaaaaaaop3yssfqnytz5uhc353cmel22duc4xn2lnxdr4f4azmi2fqga4qa"
CONFIG_PROFILE = "DEFAULT"
config = oci.config.from_file('~/.oci/config', profile_name=CONFIG_PROFILE)
###
ai_language_client = oci.ai_language.AIServiceLanguageClient(config)
Next, we can define what text we want to translate and what Language we want to translate the text into. In this case, I want to translate the text into French and to do so, we need to use the language abbreviation.
text_to_trans = "Hello. My name is Brendan and this is an example of using Oracle OCI Language translation service"
print(text_to_trans)
target_lang = "fr"
Next, we need to prepare the text and then send it to the translation service. Then, print the returned object
t_doc = oci.ai_language.models.TextDocument(key="Demo", text=text_to_trans, language_code="en")
trans_response = ai_language_client.batch_language_translation(oci.ai_language.models.BatchLanguageTranslationDetails(documents=[t_doc], target_language_code=target_lang))
print(trans_response.data)
The returned translated object is the following.
{
"documents": [
{
"key": "Demo",
"source_language_code": "en",
"target_language_code": "fr",
"translated_text": "Bonjour. Je m'appelle Brendan et voici un exemple d'utilisation du service de traduction Oracle OCI Language"
}
],
"errors": []
}
We can automate this process a little to automatically detect the input language. For example:
source_lang = ai_language_client.detect_dominant_language(oci.ai_language.models.DetectLanguageSentimentsDetails(text=text_to_trans))
t_doc = oci.ai_language.models.TextDocument(key="Demo", text=text_to_trans, language_code=source_lang.data.languages[0].code)
trans_response = ai_language_client.batch_language_translation(oci.ai_language.models.BatchLanguageTranslationDetails(documents=[t_doc], target_language_code=target_lang))
print(trans_response.data)
And we can also automate the translation into multiple different langues.
text_to_trans = "Hello. My name is Brendan and this is an example of using Oracle OCI Language translation service"
print(text_to_trans)
#target_lang = "fr"
target_lang = {"fr":"French", "nl":"Dutch", "de":"German", "it":"Italian", "ja":"Japaneese", "ko":"Korean", "pl":"polish"}
for lang_key, lang_name in target_lang.items():
t_doc = oci.ai_language.models.TextDocument(key="Demo", text=text_to_trans, language_code="en")
trans_response = ai_language_client.batch_language_translation(oci.ai_language.models.BatchLanguageTranslationDetails(documents=[t_doc], target_language_code=lang_key))
####
print(' [' + lang_name + '] ' + trans_response.data.documents[0].translated_text)
Hello. My name is Brendan and this is an example of using Oracle OCI Language translation service
[French] Bonjour. Je m'appelle Brendan et voici un exemple d'utilisation du service de traduction Oracle OCI Language
[Dutch] Hallo. Mijn naam is Brendan en dit is een voorbeeld van het gebruik van de Oracle OCI Language vertaalservice.
[German] Hallo. Mein Name ist Brendan und dies ist ein Beispiel für die Verwendung des Oracle OCI Language-Übersetzungsservice
[Italian] Ciao. Il mio nome è Brendan e questo è un esempio di utilizzo del servizio di traduzione di Oracle OCI Language
[Japaneese] こんにちは。私の名前はBrendanで、これはOracle OCI Language翻訳サービスの使用例です
[Korean] 안녕하세요. 내 이름은 브렌단이며 Oracle OCI 언어 번역 서비스를 사용하는 예입니다.
[polish] Dzień dobry. Nazywam się Brendan i jest to przykład korzystania z usługi tłumaczeniowej OCI Language
Unlock Text Analytics with Oracle OCI Python – Part 2
This is my second post on using Oracle OCI Language service to perform Text Analytics. These include Language Detection, Text Classification, Sentiment Analysis, Key Phrase Extraction, Named Entity Recognition, Private Data detection and masking, and Healthcare NLP.
In my Previous post (Part 1), I covered examples on Language Detection, Text Classification and Sentiment Analysis.
In this post (Part 2), I’ll cover:
- Key Phrase
- Named Entity Recognition
- Detect private information and marking
Make sure you check out Part 1 for details on setting up the client and establishing a connection. These details are omitted in the examples below.
Key Phrase Extraction
With Key Phrase Extraction, it aims to identify the key works and/or phrases from the text. The keywords/phrases are selected based on what are the main topics in the text along with the confidence score. The text is parsed to extra the words/phrase that are important in the text. This can aid with identifying the key aspects of the document without having to read it. Care is needed as these words/phrases do not represent the meaning implied in the text.
Using some of the same texts used in Part-1, let’s see what gets generated for the text about a Hotel experience.
t_doc = oci.ai_language.models.TextDocument(
key="Demo",
text="This hotel is a bad place, I would strongly advise against going there. There was one helpful member of staff",
language_code="en")
key_phrase = ai_language_client.batch_detect_language_key_phrases((oci.ai_language.models.BatchDetectLanguageKeyPhrasesDetails(documents=[t_doc])))
print(key_phrase.data)
print('==========')
for i in range(len(key_phrase.data.documents)):
for j in range(len(key_phrase.data.documents[i].key_phrases)):
print("phrase: ", key_phrase.data.documents[i].key_phrases[j].text +' [' + str(key_phrase.data.documents[i].key_phrases[j].score) + ']')
{
"documents": [
{
"key": "Demo",
"key_phrases": [
{
"score": 0.9998106383818767,
"text": "bad place"
},
{
"score": 0.9998106383818767,
"text": "one helpful member"
},
{
"score": 0.9944029848214838,
"text": "staff"
},
{
"score": 0.9849306609397931,
"text": "hotel"
}
],
"language_code": "en"
}
],
"errors": []
}
==========
phrase: bad place [0.9998106383818767]
phrase: one helpful member [0.9998106383818767]
phrase: staff [0.9944029848214838]
phrase: hotel [0.9849306609397931]
The output from the Key Phrase Extraction is presented into two formats about. The first is the JSON object returned from the function, containing the phrases and their confidence score. The second (below the ==========) is a formatted version of the same JSON object but parsed to extract and present the data in a more compact manner.
The next piece of text to be examined is taken from an article on the F1 website about a change of divers.
text_f1 = "Red Bull decided to take swift action after Liam Lawsons difficult start to the 2025 campaign, demoting him to Racing Bulls and promoting Yuki Tsunoda to the senior team alongside reigning world champion Max Verstappen. F1 Correspondent Lawrence Barretto explains why… Sergio Perez had endured a painful campaign that saw him finish a distant eighth in the Drivers Championship for Red Bull last season – while team mate Verstappen won a fourth successive title – and after sticking by him all season, the team opted to end his deal early after Abu Dhabi finale."
t_doc = oci.ai_language.models.TextDocument(
key="Demo",
text=text_f1,
language_code="en")
key_phrase = ai_language_client.batch_detect_language_key_phrases(oci.ai_language.models.BatchDetectLanguageKeyPhrasesDetails(documents=[t_doc]))
print(key_phrase.data)
print('==========')
for i in range(len(key_phrase.data.documents)):
for j in range(len(key_phrase.data.documents[i].key_phrases)):
print("phrase: ", key_phrase.data.documents[i].key_phrases[j].text +' [' + str(key_phrase.data.documents[i].key_phrases[j].score) + ']')
I won’t include all the output and the following shows the key phrases in the compact format
phrase: red bull [0.9991468440416812]
phrase: swift action [0.9991468440416812]
phrase: liam lawsons difficult start [0.9991468440416812]
phrase: 2025 campaign [0.9991468440416812]
phrase: racing bulls [0.9991468440416812]
phrase: promoting yuki tsunoda [0.9991468440416812]
phrase: senior team [0.9991468440416812]
phrase: sergio perez [0.9991468440416812]
phrase: painful campaign [0.9991468440416812]
phrase: drivers championship [0.9991468440416812]
phrase: red bull last season [0.9991468440416812]
phrase: team mate verstappen [0.9991468440416812]
phrase: fourth successive title [0.9991468440416812]
phrase: all season [0.9991468440416812]
phrase: abu dhabi finale [0.9991468440416812]
phrase: team [0.9420016064526977]
While some aspects of this is interesting, care is needed to not overly rely upon it. It really depends on the usecase.
Named Entity Recognition
For Named Entity Recognition is a natural language process for finding particular types of entities listed as words or phrases in the text. The named entities are a defined list of items. For OCI Language there is a list available here. Some named entities have a sub entities. The return JSON object from the function has a format like the following.
{
"documents": [
{
"entities": [
{
"length": 5,
"offset": 5,
"score": 0.969588577747345,
"sub_type": "FACILITY",
"text": "hotel",
"type": "LOCATION"
},
{
"length": 27,
"offset": 82,
"score": 0.897526216506958,
"sub_type": null,
"text": "one helpful member of staff",
"type": "QUANTITY"
}
],
"key": "Demo",
"language_code": "en"
}
],
"errors": []
}
For each named entity discovered the returned object will contain the Text identifed, the Entity Type, the Entity Subtype, Confidence Score, offset and length.
Using the text samples used previous, let’s see what gets produced. The first example is for the hotel review.
t_doc = oci.ai_language.models.TextDocument(
key="Demo",
text="This hotel is a bad place, I would strongly advise against going there. There was one helpful member of staff",
language_code="en")
named_entities = ai_language_client.batch_detect_language_entities(
batch_detect_language_entities_details=oci.ai_language.models.BatchDetectLanguageEntitiesDetails(documents=[t_doc]))
for i in range(len(named_entities.data.documents)):
for j in range(len(named_entities.data.documents[i].entities)):
print("Text: ", named_entities.data.documents[i].entities[j].text, ' [' + named_entities.data.documents[i].entities[j].type + ']'
+ '[' + str(named_entities.data.documents[i].entities[j].sub_type) + ']' + '{offset:'
+ str(named_entities.data.documents[i].entities[j].offset) + '}')
Text: hotel [LOCATION][FACILITY]{offset:5}
Text: one helpful member of staff [QUANTITY][None]{offset:82}
The last two lines above are the formatted output of the JSON object. It contains two named entities. The first one is for the text “hotel” and it has a Entity Type of Location, and a Sub Entitity Type of Location. The second named entity is for a long piece of string and for this it has a Entity Type of Quantity, but has no Sub Entity Type.
Now let’s see what is creates for the F1 text. (the text has been given above and the code is very similar/same as above).
Text: Red Bull [ORGANIZATION][None]{offset:0}
Text: swift [ORGANIZATION][None]{offset:25}
Text: Liam Lawsons [PERSON][None]{offset:44}
Text: 2025 [DATETIME][DATE]{offset:80}
Text: Yuki Tsunoda [PERSON][None]{offset:138}
Text: senior [QUANTITY][AGE]{offset:158}
Text: Max Verstappen [PERSON][None]{offset:204}
Text: F1 [ORGANIZATION][None]{offset:220}
Text: Lawrence Barretto [PERSON][None]{offset:237}
Text: Sergio Perez [PERSON][None]{offset:269}
Text: campaign [EVENT][None]{offset:304}
Text: eighth in the [QUANTITY][None]{offset:343}
Text: Drivers Championship [EVENT][None]{offset:357}
Text: Red Bull [ORGANIZATION][None]{offset:382}
Text: Verstappen [PERSON][None]{offset:421}
Text: fourth successive title [QUANTITY][None]{offset:438}
Text: Abu Dhabi [LOCATION][GPE]{offset:545}
Detect Private Information and Marking
The ability to perform data masking has been available in SQL for a long time. There are lots of scenarios where masking is needed and you are not using a Database or not at that particular time.
With Detect Private Information or Personal Identifiable Information the OCI AI function search for data that is personal and gives you options on how to present this back to the users. Examples of the types of data or Entity Types it will detect include Person, Adddress, Age, SSN, Passport, Phone Numbers, Bank Accounts, IP Address, Cookie details, Private and Public keys, various OCI related information, etc. The list goes on. Check out the documentation for more details on these. Unfortunately the documentation for how the Python API works is very limited.
The examples below illustrate some of the basic options. But there is lots more you can do with this feature like defining you own rules.
For these examples, I’m going to use the following text which I’ve assigned to a variable called text_demo.
Hi Martin. Thanks for taking my call on 1/04/2025. Here are the details you requested. My Bank Account Number is 1234-5678-9876-5432 and my Bank Branch is Main Street, Dublin. My Date of Birth is 29/02/1993 and I’ve been living at my current address for 15 years. Can you also update my email address to brendan.tierney@email.com. If toy have any problems with this you can contact me on +353-1-493-1111. Thanks for your help. Brendan.
m_mode = {"ALL":{"mode":'MASK'}}
t_doc = oci.ai_language.models.TextDocument(key="Demo", text=text_demo,language_code="en")
pii_entities = ai_language_client.batch_detect_language_pii_entities(oci.ai_language.models.BatchDetectLanguagePiiEntitiesDetails(documents=[t_doc], masking=m_mode))
print(text_demo)
print('--------------------------------------------------------------------------------')
print(pii_entities.data.documents[0].masked_text)
print('--------------------------------------------------------------------------------')
for i in range(len(pii_entities.data.documents)):
for j in range(len(pii_entities.data.documents[i].entities)):
print("phrase: ", pii_entities.data.documents[i].entities[j].text +' [' + str(pii_entities.data.documents[i].entities[j].type) + ']')
Hi Martin. Thanks for taking my call on 1/04/2025. Here are the details you requested. My Bank Account Number is 1234-5678-9876-5432 and my Bank Branch is Main Street, Dublin. My Date of Birth is 29/02/1993 and I've been living at my current address for 15 years. Can you also update my email address to brendan.tierney@email.com. If toy have any problems with this you can contact me on +353-1-493-1111. Thanks for your help. Brendan.
--------------------------------------------------------------------------------
Hi ******. Thanks for taking my call on *********. Here are the details you requested. My Bank Account Number is ******************* and my Bank Branch is Main Street, Dublin. My Date of Birth is ********** and I've been living at my current address for ********. Can you also update my email address to *************************. If toy have any problems with this you can contact me on ***************. Thanks for your help. *******.
--------------------------------------------------------------------------------
phrase: Martin [PERSON]
phrase: 1/04/2025 [DATE_TIME]
phrase: 1234-5678-9876-5432 [CREDIT_DEBIT_NUMBER]
phrase: 29/02/1993 [DATE_TIME]
phrase: 15 years [DATE_TIME]
phrase: brendan.tierney@email.com [EMAIL]
phrase: +353-1-493-1111 [TELEPHONE_NUMBER]
phrase: Brendan [PERSON]
The above this the basic level of masking.
A second option is to use the REMOVE mask. For this, change the mask format to the following.
m_mode = {"ALL":{'mode':'REMOVE'}}
For this option the indentified information is removed from the text.
Hi . Thanks for taking my call on . Here are the details you requested. My Bank Account Number is and my Bank Branch is Main Street, Dublin. My Date of Birth is and I've been living at my current address for . Can you also update my email address to . If toy have any problems with this you can contact me on . Thanks for your help. .
--------------------------------------------------------------------------------
phrase: Martin [PERSON]
phrase: 1/04/2025 [DATE_TIME]
phrase: 1234-5678-9876-5432 [CREDIT_DEBIT_NUMBER]
phrase: 29/02/1993 [DATE_TIME]
phrase: 15 years [DATE_TIME]
phrase: brendan.tierney@email.com [EMAIL]
phrase: +353-1-493-1111 [TELEPHONE_NUMBER]
phrase: Brendan [PERSON]
For the REPLACE option we have.
m_mode = {"ALL":{'mode':'REPLACE'}}
Which gives us the following, where we can see the key information is removed and replace with the name of Entity Type.
Hi <PERSON>. Thanks for taking my call on <DATE_TIME>. Here are the details you requested. My Bank Account Number is <CREDIT_DEBIT_NUMBER> and my Bank Branch is Main Street, Dublin. My Date of Birth is <DATE_TIME> and I've been living at my current address for <DATE_TIME>. Can you also update my email address to <EMAIL>. If toy have any problems with this you can contact me on <TELEPHONE_NUMBER>. Thanks for your help. <PERSON>.
--------------------------------------------------------------------------------
phrase: Martin [PERSON]
phrase: 1/04/2025 [DATE_TIME]
phrase: 1234-5678-9876-5432 [CREDIT_DEBIT_NUMBER]
phrase: 29/02/1993 [DATE_TIME]
phrase: 15 years [DATE_TIME]
phrase: brendan.tierney@email.com [EMAIL]
phrase: +353-1-493-1111 [TELEPHONE_NUMBER]
phrase: Brendan [PERSON]
We can also change the character used for the masking. In this example we change the masking character to + symbol.
m_mode = {"ALL":{'mode':'MASK','maskingCharacter':'+'}}
Hi ++++++. Thanks for taking my call on +++++++++. Here are the details you requested. My Bank Account Number is +++++++++++++++++++ and my Bank Branch is Main Street, Dublin. My Date of Birth is ++++++++++ and I've been living at my current address for ++++++++. Can you also update my email address to +++++++++++++++++++++++++. If toy have any problems with this you can contact me on +++++++++++++++. Thanks for your help. +++++++.
I mentioned at the start of this section there was lots of options available to you, including defining your own rules, using regular expressions, etc. Let me know if you’re interested in exploring some of these and I can share a few more examples.
Unlock Text Analytics with Oracle OCI Python – Part 1
Oracle OCI has a number of features that allows you to perform Text Analytics such as Language Detection, Text Classification, Sentiment Analysis, Key Phrase Extraction, Named Entity Recognition, Private Data detection and masking, and Healthcare NLP.

While some of these have particular (and in some instances limited) use cases, the following examples will illustrate some of the main features using the OCI Python library. Why am I using Python to illustrate these? This is because most developers are using Python to build applications.
In this post, the Python examples below will cover the following:
- Language Detection
- Text Classification
- Sentiment Analysis
In my next post on this topic, I’ll cover:
- Key Phrase
- Named Entity Recognition
- Detect private information and marking
Before you can use any of the OCI AI Services, you need to set up a config file on your computer. This will contain the details necessary to establish a secure connection to your OCI tendency. Check out this blog post about setting this up.
The following Python examples illustrate what is possible for each feature. In the first example, I include what is needed for the config file. This is not repeated in the examples that follow, but it is still needed.
Language Detection
Let’s begin with a simple example where we provide a simple piece of text and as OCI Language Service, using OCI Python, to detect the primary language for the text and display some basic information about this prediction.
import oci
from oci.config import from_file
#Read in config file - this is needed for connecting to the OCI AI Services
CONFIG_PROFILE = "DEFAULT"
config = oci.config.from_file('~/.oci/config', profile_name=CONFIG_PROFILE)
###
ai_language_client = oci.ai_language.AIServiceLanguageClient(config)
# French :
text_fr = "Bonjour et bienvenue dans l'analyse de texte à l'aide de ce service cloud"
response = ai_language_client.detect_dominant_language(
oci.ai_language.models.DetectLanguageSentimentsDetails(
text=text_fr
)
)
print(response.data.languages[0].name)
----------
French
In this example, I’ve a simple piece of French (for any native French speakers, I do apologise). We can see the language was identified as French. Let’s have a closer look at what is returned by the OCI function.
print(response.data)
----------
{
"languages": [
{
"code": "fr",
"name": "French",
"score": 1.0
}
]
}
We can see from the above, the object contains the language code, the full name of the language and the score to indicate how strong or how confident the function is with the prediction. When the text contains two or more languages, the function will return the primary language used.
Note: OCI Language can detect at least 113 different languages. Check out the full list here.
Let’s give it a try with a few other languages, including Irish, which localised to certain parts of Ireland. Using the same code as above, I’ve included the same statement (google) translated into other languages. The code loops through each text statement and detects the language.
import oci
from oci.config import from_file
###
CONFIG_PROFILE = "DEFAULT"
config = oci.config.from_file('~/.oci/config', profile_name=CONFIG_PROFILE)
###
ai_language_client = oci.ai_language.AIServiceLanguageClient(config)
# French :
text_fr = "Bonjour et bienvenue dans l'analyse de texte à l'aide de ce service cloud"
# German:
text_ger = "Guten Tag und willkommen zur Textanalyse mit diesem Cloud-Dienst"
# Danish
text_dan = "Goddag, og velkommen til at analysere tekst ved hjælp af denne skytjeneste"
# Italian
text_it = "Buongiorno e benvenuti all'analisi del testo tramite questo servizio cloud"
# English:
text_eng = "Good day, and welcome to analysing text using this cloud service"
# Irish
text_irl = "Lá maith, agus fáilte romhat chuig anailís a dhéanamh ar théacs ag baint úsáide as an tseirbhís scamall seo"
for text in [text_eng, text_ger, text_dan, text_it, text_irl]:
response = ai_language_client.detect_dominant_language(
oci.ai_language.models.DetectLanguageSentimentsDetails(
text=text
)
)
print('[' + response.data.languages[0].name + ' ('+ str(response.data.languages[0].score) +')' + '] '+ text)
----------
[English (1.0)] Good day, and welcome to analysing text using this cloud service
[German (1.0)] Guten Tag und willkommen zur Textanalyse mit diesem Cloud-Dienst
[Danish (1.0)] Goddag, og velkommen til at analysere tekst ved hjælp af denne skytjeneste
[Italian (1.0)] Buongiorno e benvenuti all'analisi del testo tramite questo servizio cloud
[Irish (1.0)] Lá maith, agus fáilte romhat chuig anailís a dhéanamh ar théacs ag baint úsáide as an tseirbhís scamall seo
When you run this code yourself, you’ll notice how quick the response time is for each.
Text Classification
Now that we can perform some simple language detections, we can move on to some more insightful functions. The first of these is Text Classification. With Text Classification, it will analyse the text to identify categories and a confidence score of what is covered in the text. Let’s have a look at an example using the English version of the text used above. This time, we need to perform two steps. The first is to set up and prepare the document to be sent. The second step is to perform the classification.
### Text Classification
text_document = oci.ai_language.models.TextDocument(key="Demo", text=text_eng, language_code="en")
text_class_resp = ai_language_client.batch_detect_language_text_classification(
batch_detect_language_text_classification_details=oci.ai_language.models.BatchDetectLanguageTextClassificationDetails(
documents=[text_document]
)
)
print(text_class_resp.data)
----------
{
"documents": [
{
"key": "Demo",
"language_code": "en",
"text_classification": [
{
"label": "Internet and Communications/Web Services",
"score": 1.0
}
]
}
],
"errors": []
}
We can see it has correctly identified the text is referring to or is about “Internet and Communications/Web Services”. For a second example, let’s use some text about F1. The following is taken from an article on F1 app and refers to the recent Driver issues, and we’ll use the first two paragraphs.
{
"documents": [
{
"key": "Demo",
"language_code": "en",
"text_classification": [
{
"label": "Sports and Games/Motor Sports",
"score": 1.0
}
]
}
],
"errors": []
}
We can format this response object as follows.
print(text_class_resp.data.documents[0].text_classification[0].label
+ ' [' + str(text_class_resp.data.documents[0].text_classification[0].score) + ']')
----------
Sports and Games/Motor Sports [1.0]
It is possible to get multiple classifications being returned. To handle this we need to use a couple of loops.
for i in range(len(text_class_resp.data.documents)):
for j in range(len(text_class_resp.data.documents[i].text_classification)):
print("Label: ", text_class_resp.data.documents[i].text_classification[j].label)
print("Score: ", text_class_resp.data.documents[i].text_classification[j].score)
----------
Label: Sports and Games/Motor Sports
Score: 1.0
Yet again, it correctly identified the type of topic area for the text. At this point, you are probably starting to get ideas about how this can be used and in what kinds of scenarios. This list will probably get longer over time.
Sentiement Analysis
For Sentiment Analysis we are looking to gauge the mood or tone of a text. For example, we might be looking to identify opinions, appraisals, emotions, attitudes towards a topic or person or an entity. The function returned an object containing a positive, neutral, mixed and positive sentiments and a confidence score. This feature currently supports English and Spanish.
The Sentiment Analysis function provides two way of analysing the text:
- At a Sentence level
- Looks are certain Aspects of the text. This identifies parts/words/phrase and determines the sentiment for each
Let’s start with the Sentence level Sentiment Analysis with a piece of text containing two sentences with both negative and positive sentiments.
#Sentiment analysis
text = "This hotel was in poor condition and I'd recommend not staying here. There was one helpful member of staff"
text_document = oci.ai_language.models.TextDocument(key="Demo", text=text, language_code="en")
text_doc=oci.ai_language.models.BatchDetectLanguageSentimentsDetails(documents=[text_document])
text_sentiment_resp = ai_language_client.batch_detect_language_sentiments(text_doc, level=["SENTENCE"])
print (text_sentiment_resp.data)
The response object gives us:
{
"documents": [
{
"aspects": [],
"document_scores": {
"Mixed": 0.3458947,
"Negative": 0.41229093,
"Neutral": 0.0061426135,
"Positive": 0.23567174
},
"document_sentiment": "Negative",
"key": "Demo",
"language_code": "en",
"sentences": [
{
"length": 68,
"offset": 0,
"scores": {
"Mixed": 0.17541811,
"Negative": 0.82458186,
"Neutral": 0.0,
"Positive": 0.0
},
"sentiment": "Negative",
"text": "This hotel was in poor condition and I'd recommend not staying here."
},
{
"length": 37,
"offset": 69,
"scores": {
"Mixed": 0.5163713,
"Negative": 0.0,
"Neutral": 0.012285227,
"Positive": 0.4713435
},
"sentiment": "Mixed",
"text": "There was one helpful member of staff"
}
]
}
],
"errors": []
}
There are two parts to this object. The first part gives us the overall Sentiment for the text, along with the confidence scores for all possible sentiments. The second part of the object breaks the test into individual sentences and gives the Sentiment and confidence scores for the sentence. Overall, the text used in “Negative” with a confidence score of 0.41229093. When we look at the sentences, we can see the first sentence is “Negative” and the second sentence is “Mixed”.
When we switch to using Aspect we can see the difference in the response.
text_sentiment_resp = ai_language_client.batch_detect_language_sentiments(text_doc, level=["ASPECT"])
print (text_sentiment_resp.data)
The response object gives us:
{
"documents": [
{
"aspects": [
{
"length": 5,
"offset": 5,
"scores": {
"Mixed": 0.17299445074935532,
"Negative": 0.8268503302365734,
"Neutral": 0.0,
"Positive": 0.0001552190140712097
},
"sentiment": "Negative",
"text": "hotel"
},
{
"length": 9,
"offset": 23,
"scores": {
"Mixed": 0.0020200687053503,
"Negative": 0.9971282906307877,
"Neutral": 0.0,
"Positive": 0.0008516406638620019
},
"sentiment": "Negative",
"text": "condition"
},
{
"length": 6,
"offset": 91,
"scores": {
"Mixed": 0.0,
"Negative": 0.002300517913679934,
"Neutral": 0.023815747524769032,
"Positive": 0.973883734561551
},
"sentiment": "Positive",
"text": "member"
},
{
"length": 5,
"offset": 101,
"scores": {
"Mixed": 0.10319573538533408,
"Negative": 0.2070680870320537,
"Neutral": 0.0,
"Positive": 0.6897361775826122
},
"sentiment": "Positive",
"text": "staff"
}
],
"document_scores": {},
"document_sentiment": "",
"key": "Demo",
"language_code": "en",
"sentences": []
}
],
"errors": []
}
The different aspects are extracted, and the sentiment for each within the text is determined. What you need to look out for are the labels “text” and “sentiment.
You must be logged in to post a comment.