APPLICATION OF ARTIFICIAL INTELLIGENCE WITH NATURAL LANGUAGE PROCESSING FOR QUALITATIVE RESEARCH TEXTS IN THE MEDICAL-PATIENT RELATIONSHIP WITH MENTAL ILLNESS THROUGH THE USE OF MOBILE TEHNOLOGIES

José Vicente Sancho Escrivá; Carlos Fanjul Peyró; María de la Iglesia Vayá; Joaquin A. Montell; María José Escartí Fabra

doi:http://doi.org/10.35669/rcys.2020.10(1).19-41

APPLICATION OF ARTIFICIAL INTELLIGENCE WITH NATURAL LANGUAGE PROCESSING FOR QUALITATIVE RESEARCH TEXTS IN THE MEDICAL-PATIENT RELATIONSHIP WITH MENTAL ILLNESS THROUGH THE USE OF MOBILE TEHNOLOGIES

José Vicente Sancho Escrivá ¹ , Carlos Fanjul Peyró ¹ , María de la Iglesia Vayá ² , Joaquin A. Montell ³ , María José Escartí Fabra ⁴

1 Universitat Jaume I, España

2 Unidad Mixta de Imagen Biomédica FISABIO-CIPF, España

3 Centro de Investigación Príncipe Felipe, España

4 Hospital Clínico Valencia, España

Resumen

Artificial Intelligence (AI) continues to position itself in society as a benchmark for technological progress. Within this field, Natural Language Processing (NLP) reaches great acceptance in disciplines that work with high volumes of data (Big Data). In this framework we want to see what do these algorithms contribute with, but applied to communication in the field of mental health. We establish this methodology with NLP based on previous qualitative observations in transcribed texts of focus groups. These texts were obtained from focus groups carried out on patients with mental illnesses in order to understand whether the application of this methodology contributes to any improvement on the analysis of data, which has been shown in previous researches. However, this research has been applied in a novel way in the field of mental health. To do this, scripts based on Python code have been executed and the texts have been purified, classifying the word strings into entities called tokens and eliminating stopwords. Subsequently, the frequency of words and the connection of sentences have been analyzed, obtaining a set of structures in which to apply Machine Learning techniques using word2vec and generating vectors on the data, which are represented with n-dimensional graphics where a new vocabulary based on proximity words is created. We are applying a method that without algorithmic learning we would be unable to obtain this type of information in the previous analysis of qualitative research.The main themes found with traditional qualitative analysis are identified in the analysis, mechanizing the process and facilitating it. It is also shown that this methodology is applicable in mental health as in other population groups.

APLICACIÓN DE LA INTELIGENCIA ARTIFICIAL CON PROCESAMIENTO DEL LENGUAJE NATURAL PARA TEXTOS DE INVESTIGACIÓN CUALITATIVA EN LA RELACIÓN MÉDICO-PACIENTE CON ENFERMEDAD MENTAL MEDIANTE EL USO DE TECNOLOGÍAS MÓVILES

Abstract

La Inteligencia Artificial (IA) sigue posicionándose en la sociedad como referencia del progreso tecnológico. Dentro de este campo, el Procesamiento de Lenguaje Natural (PLN) alcanza gran aceptación en disciplinas que trabajen con altos volúmenes de datos (Big Data). En este marco queremos ver qué aportan estos algoritmos, pero aplicados a la comunicación en el campo de la salud mental. Establecemos esta metodología con PLN partiendo de observaciones cualitativas previas en textos transcritos de grupos focales realizados a pacientes con enfermedad mental con el objetivo de entender si la aplicación de esta metodología aporta mejora al análisis de los datos como se ha demostrado en investigaciones previas, pero aplicado novedosamente al campo de la salud mental. Para ello se han ejecutado scripts basados en código Python y se han depurado los textos, clasificando las cadenas de palabras en entidades denominadas tokens y eliminando las palabras vacías. Posteriormente, se ha analizado la frecuencia de palabras y la conexión de frases, obteniendo un conjunto de estructuras donde aplicar técnicas de Machine Learning mediante Word2vec y generando vectores sobre los datos quedando representados con gráficas n-dimensionales en donde se configura un nuevo vocabulario con palabras agrupadas por cercanía. Aplicamos un método que sin el aprendizaje algorítmico se nos escapa en el análisis previo de una investigación cualitativa. Se identifican en el análisis los principales temas encontrados con el análisis cualitativo tradicional, mecanizando el proceso y facilitándolo. Se demuestra además que esta metodología es aplicable en la salud mental como en otros grupos de población.

Keywords

artificial intelligence, natural language processing, machine learning, communication, social science, mHealth, mental health.

INTRODUCTION

Artificial Intelligence (AI) continues to position itself in all areas of society as a benchmark of technological progress. This is reflected in the growing number of publications in this field in recent years (Perrault et al., 2019). This discipline, which consists of the ability of machines and algorithms to replicate how a human being thinks and acts (Aghion, Jones, & Jones, 2017) and which seems to be very distant from us, is in fact more integrated into our lives than ever. A simple and commonplace gesture, such as a Google search to gather information, is based on the application of AI which, through algorithms of various kinds, shows us a huge amount of data through information links based on the keywords of our search and other variables such as geolocation.

In this sense, it must be said that AI as a benchmark of innovation, due to its characteristics, is an opportunity in data processing and the application of mathematical algorithms on texts and words applicable to qualitative research methodologies in the face of the collection and management of a high volume of information and data (Big Data).

AI has applications of high interest in health issues and is applied in many biomedical areas. It can be observed that AI is playing an increasingly important role in biomedicine, not only because of the continuous progress of AI itself, but also because of the innately complex nature of biomedical problems and the suitability of AI to solve such problems. New AI capabilities provide novel solutions for biomedicine, and the development of biomedicine demands new levels of AI capability (Rong, Mendez, Assi, Zhao, & Sawan, 2020). AI technologies can perform a wide range of functions, such as assisting in diagnostic guidance and therapy selection, making risk predictions and stratifying diseases, reducing medical errors and improving productivity (He et al., 2020).

With respect to mental health, the potential applications of AI in psychiatry can be grouped into two broad categories. One category focuses on Natural Language Processing (NLP), which allows the world of computing devices to understand, interpret and manipulate human language. The other category focuses on chatbots, which are digital conversational agents that use AI methods through text and/or voice to mimic human behaviour through an evolving dialogue. Chatbots are seen as a means to provide mental health care in regions with low access to medical care or to people who have difficulty disclosing their feelings to a human being. Chatbots have been shown to be effective in reducing symptoms of depression and anxiety (Brunn, Diefenbacher, Courtet, & Genieys, 2020).

Within this broad field of AI technology, the category of PLN is becoming increasingly popular in disciplines that work with high volumes of data, including, among them, the health sector. This set of techniques comprising PLN consists of analysing and representing natural texts by means of software and algorithms at one or more levels of linguistic analysis with the aim of obtaining a human-like appearance in language processing for specific tasks (Liddy, 2001). In short, PLN techniques consist of the application of AI for the analysis of behavioural data, which are developed using embedded machine learning or Embedded Machine Learning techniques after data collection (Rong et al., 2020).

One advantage that the application of PLN can have is that the researcher does not have to interpret the texts and it is the algorithms that learn from the data to generate results. However, the initial limitation may be the software's lack of natural language understanding.

AI techniques have been evolving in the field of discourse analysis, until today research is being carried out with this type of methodologies and tools with real-world applications. Work is currently being done, among other areas, on obtaining health information from data collection or identifying feelings or emotions. In recent years, we have moved from using simpler methods of analysing words without identifying sentence structure and the meaning of speech to better systems that apply automatic learning based on the application of more advanced software or machine learning. These advances in AI allow for a better understanding of language with high-performance tools and methods that allow discourse to be analysed from the data, identifying syntax, semantic information and the context of the discourse itself (Hirschberg & Manning, 2015).

Within this framework, we want to see and understand what this type of algorithm contributes by applying it to the field of communication. Communication is circumscribed within the social sciences, where we usually work with qualitative methodologies, something closer to the word and the descriptive than quantitative methodologies, more focused on numerical and quantifiable data (Taylor & Bogdan, 1987). It is a differentiation framed between the subjectivity and objectivity of the researchers themselves when faced with the selection of the working method. This differentiation has been the subject of many scientific debates questioning qualitative research in the social sciences applied to health for its apparent lack of validity (Steckler, Mcleroy, Goodman, Bird, & Mccormick, 1992).

In any case, one of the arguments that gives legitimacy to this type of methodologies over quantitative ones in research is the naturalness that is generated during communication between the researcher and the participants selected for research (Calero, 2000), something that is highly appreciated in the social sciences. This can be observed fundamentally in group interviews, where the focus group technique stands out, as it is a procedure that brings together groups of people, between 3 and 12 participants (Turney & Pocknee, 2005), selected on the basis of specific criteria with the aim of maintaining a close, natural and as horizontal conversation as possible (Morgan & Krueger, 1998) through a set of rigorously elaborated questions with a specific objective. This interview format is then transcribed, coded, classified and analysed (Powell & Single, 1996). One of the advantages of this type of method is that it allows a large amount of information and therefore a large volume of data to be obtained in a short time (Gibbs, 1997).

The disadvantage of this type of methodology that we wish to address focuses on the management of these groups of data and their interpretation, since it is possible that, among other limitations or errors, a bias may be produced by a research subject who stands out among the selected group or by the interviewer himself (Bertoldi, Fiorito, & Álvarez, 2006).

The Internet and the digital world is a meeting point for information and communication, which facilitates the possibility of improving the relationship between the health professional and the patient as a potential channel for improving communication between the two (Lupiáñez-Villanueva, 2011).

Qualitative research methods are increasingly used across disciplines because of their ability to help researchers understand participants' perspectives in their own words. However, qualitative analysis is a laborious process and requires the intervention of many resources. To achieve depth, researchers are limited to smaller sample sizes when analysing text data. One potential method to address this approach is PLN. Qualitative text analysis involves researchers reading data, assigning code labels and iteratively developing results. PLN has the potential to automate part of this process. Of the studies that have focused on looking at the potential of PLN some conclude that this set of techniques provides a basis for coding qualitatively faster and a method for validating qualitative findings (Bustos, Pertusa, Salinas, & Iglesia-Vayá, 2019).

Qualitative methods offer enormous potential to contribute to the field of mental health services research, but have the trade-off of being very labour-intensive.

According to our review, this methodology is also novel in the field of mental health and in the use of the analysis of information obtained in focus groups on the use of new communication technologies.

In this case, we started from a methodology of information collection through interviews with two focus groups with a sample of 5 participants. These samples were made up of groups of patients who were chosen on the basis of preliminary research studies with first psychotic episodes and who were interviewed by psychiatric professionals, always complying with ethical procedures such as obtaining a voluntarily signed informed consent form. Hence, the possible bias is produced by the limitation of the mental illness of the interviewed group, although the conductors of the focus group knew the patients very well, seeking precisely this naturalness in communication and trying to motivate them to participate.

The difference between the two groups of patients was that the first group was investigated to see how much they adopted new communication technologies, whether they felt able to use the internet and smartphone devices related to health issues and whether they used mobile applications (apps) in this area. They were asked whether they felt that the information they found on health issues was reliable and whether they felt it helped them in their particular cases. And it was suggested that if they had an app on their mobile phone that monitored their health, and collected information on how they were doing and reminded them to take their medication, it could help them personally and if they felt it was valuable to their doctor. In addition to collecting information on whether such an app would give them more autonomy and empowerment over their disease, and whether they felt that using this app would improve communication with their doctor.

The conclusions of the first focus group interview established that the use of the Internet and communication and information technologies is similar in patients with mental illness as in the general population and 100% considered that the technologies of mobile devices applied to health (mHealth) would help them to improve communication with their doctor and could help them with adherence to treatment and taking medication. In addition, all patients surveyed stated that better communication with their doctor makes them feel better and safer.

The second focus group was designed with a sample of 5 participants, but in which each had piloted a health app for some time on their own or a family member's smartphone. Of the 5 patients, one was lost to follow-up. This app transferred the information that the user voluntarily filled in every time the software asked them via notification, had alerts to help them adhere to the treatment of this type of chronic disease, and the information was collected and represented in the software designed exclusively for medical professionals.

The results studied in this focus group considered that the mobile phone and new technologies helped in 75% of cases to have greater communication with their doctor. Another 75% considered that new technologies applied to the world of healthcare were useful and helped to improve adherence to treatment and daily medication. And unanimously, 100% of the sample considered that better communication with your doctor makes you feel better and safer.

With this previous research as a starting point, a new milestone is posed as to what AI can do to enrich the methodological process of qualitative research in focus groups and in the doctor-patient relationship with mental illness through the use of mobile technologies.

OBJETIVES

The main objective of this article is to investigate the differences between working methodologies applied to qualitative research in the field of communication and health.

It starts from the subjective interpretation of this field of research based on qualitative methodologies, obtaining a type of results and conclusions, and analyses whether, with the application of AI and machine learning on the same work sources, the results and conclusions vary or are confirmed.

The aim is to carry out an approach based on data through the application of AI with the aim of obtaining new, more empirical conclusions through mathematical models of Machine Learning applied to qualitative research with the interpretation of transcribed texts to enrich the methodological process.

METHODOLOGY

The methodology followed in this research work consisted of applying, within the context of AI, the algorithms that work and allow us to understand natural language using the PLN set of techniques.

With the set of specific PLN libraries using NLTK (Loper & Bird, 2002) that supports easy prototyping and alphabetised programming, the pre-processing of the texts focused on debugging the most notable data of a text using Python scripts. All this after classifying the character strings, separating words from the text into entities called tokens, in this case linguistic tokens, called words, which do not need to be decomposed in further processing (Webster & Kit, 1992), eliminating empty words or stopwords, i.e. those that accompany and have no meaning if they are not related to other words.

After this pre-processing, in the next (or initial exploration) phase, cleaning filters were applied, which are usually used to treat the text, by applying regular expressions, such as, for example, lowercase text, eliminating punctuation marks, question marks, extra spaces, tabs, etc.

Subsequently, we focused on the frequency with which characters and sentence connections appear in order to obtain results that would allow us to see how this data set is vectorised and how it is represented through a graphical display in n-dimensional space. In this way, the aim is to identify the possible semantic and syntactic relationships of the words or data processed.

The proposed methodology is detailed below:

FONTS AND SOFTWARE

The selected texts come from two documents containing two sessions of previous qualitative analysis using the format of focus groups with patients with first psychotic episodes. The two texts are transcripts of these recorded sessions. One session focused on patients who did not have a mobile application. To focus the topic, it is intended to analyse whether the use of a mobile application with the aim of improving communication between patients and health specialists. In the second group of patients, the aim is to study the effects of the use of mobile phones in improving doctor-patient communication.

From this point on, the process begins and Jupyter Notebook is used as the interface for creating the scripts in Python. This is a working environment that has been widely accepted by data scientists since its appearance in 2015.

After installing Python 3 (conda distribution), using the Anaconda distribution, Jupyter Notebook is accessed to debug through code, which terms are relevant in a text from those that are not.

PRE-PROCESSING OF TRANSCRIBED TEXTS

In the first instance, a cell in a new Jupyter Notebook was edited in Jupyter Notebook. The text was imported into the notebook with the encoding "UTF-8". After pre-loading the sample of patients with psychotic symptoms without mobile application a pre-processing of the transcribed document was made. After executing this cell, the first result was obtained and it was possible to move on to the next cell to advance in the processing of the language of the transcribed texts.

Once the process had started, the next step was to work on and manipulate the string of characters. This is one of the key and generic phases that is usually carried out in text processing. The purpose of this procedure is to clean up the original text of the transcription of the first focus group. In this way, with the help of the scripts developed ad hoc, both capital letters and accents could be removed from the raw imported text in order to label each of the terms more precisely. To do this, the lower() function was used, which allowed the resulting value of the selected text to be displayed on the screen as a string of lowercase characters without accents.

The next phase consisted of negating regular expressions and repetition patterns inherent in the transcription of the texts. To do this, a search was carried out in the resulting text for a pattern of terms that were repeated by the focus group's own transcription from the beginning of the string of characters that we had already pre-processed. These terms were the ones that named the moderators of the group session each time they intervened to talk to the patients.

TOKENISATION, EMPTY WORD REMOVAL AND WORD ROOT EXTRACTION

In a next step, the text was tokenised to remove punctuation marks and spaces from the resulting string of words, except for the full stop, so that the character set was not misinterpreted. The functions re.sub() and replace() were used for this purpose.

This step was followed by the data cleaning phase. The aim of this procedure is to eliminate what are known as stopwords, or empty words, which do not add value to the data tagging, such as prepositions or articles. These types of words, of course, are often repeated in text transcriptions and do not capture the essence of the words and expressions that help us to develop natural language processing. Also common in PLN is the process of extracting the roots of words, known as stemming, which was discarded when it was observed that the result prevented some interpretation of the character string, since in this particular context the application of this last step eliminated sensitivity and precision. Subsequently, it was considered to eliminate any word of less than 3 characters with the function len(w)>3, thus also eliminating empty words and the repeated pattern of initials with two characters that anonymised the patients each time they intervened in the session.

Finally, before studying the frequency of characters, once the text had been pre-processed with the script, the result of its execution is presented.

WORD FREQUENCY STUDY

Among the multiple possibilities of applying PLN techniques, we have implemented code libraries that allow us to display on a canvas a set of keywords with n-dimensions differentiated depending on the frequency analysed. In this way, the most repeated words in the texts analysed are presented prominently above the rest of the data so that we can visually detect the most relevant ones in a word cloud. To do this, we use the wordcloud library's own code using Python, which allows us to create and generate the desired image, display it on screen and save it for subsequent analysis. A white background was selected, after initially testing a black background, which helps to visually identify the highlighted terms more easily in a more legible way, as well as assigning maximum font sizes between the 100 most highlighted words with the code.

The PLN pattern was repeated for the other qualitative research text to patients coming from the first psychotic episode programme who did use the mobile app.

DATA VECTORISATION

To improve natural language processing, the research work focused on analysing how, through the application of machine learning techniques, this set of extracted data was vectorised and how the different tokenised words and terms were related and contextualised.

To do this, we worked with text embeddings, a technology that allows words to be represented vectorially. The tool selected for this purpose was Word2vec, a widely accepted software developed by Google researchers in 2013 (Mikolov, Sutskever, Chen, Corrado, & Dean, 2013). This AI technology aims to extract syntactic and semantic relations between pre-processed words. In this way, words that share more relations of any kind between them are vectorially represented in close dimensions by points in space. The sum of embedded points learns and adapts to the space as syntactic and semantic associations are observed. The aim of all this is to obtain patterns or interpretations of the data using the algorithms provided by computers with these Machine Learning techniques.

Subsequently, with Jupyter Notebook, a new code was created. In this case, the procedure consisted of creating a pattern with the set of characters, obtaining a list of isolated sentences with spaces from the sentences in the text corpus.

LEMMATISATION AND ROOT WORD EXTRACTION

After splitting the string of characters into a list of separate phrases, we proceeded to identify and display the root of the words, eliminating any prefixes or suffixes by means of a process known as stemming. To do this, the stemmer library in Spanish was imported, which allowed us to generate a new list of words.

After obtaining the new set of words, the Word2vec library was applied. To do this, the Gensim library was imported into Python, as well as a set of tools that allow us to convert the words into vectors. These words are represented according to their context in what is known as embedding.

A word vector file was first generated, a process in which the encoding was executed for the vector dimension where the points, data or words were embedded in space. This involved transforming the working file into Word2vec format.

Applying the gensim.word2vectensor script, two files were extracted, the first is _tensor.tsv, a file in 2D space with the word vectors embedded in its dimension, and the second file, metadata.tsv, where the set of words was obtained.

With the words processed and prepared for embedding, they were automatically counted with the code developed.

Subsequently, the same methodology was repeated for the second set of text, that of the patients who did use the mobile application, obtaining a list of metadata different from the previous case for the vector representation process in the Word2vec software.

EMBEDDINGS DISPLAY

The next phase begins with a visualisation of the results and the files generated are loaded into an embedding projector, in this case using the open source web application http://projector.tensorflow.org/ (Abadi et al., 2016). The data panel is used to select the files analysed with the model generated from our dataset and thus be able to observe the embedded points in their n-dimension with the central or visualisation panel.

This makes it possible to display the visualisation of our embeddings easily and automatically, in order to go deeper into the word connections in natural language processing.

We proceeded to review those words that are considered key and their data connections and associations. The tool searches for the most relevant connections. After testing at first with the word "mobile" which is marked in the embedded point cloud because it is key to its interpretation and focuses the initial conversations of the first work sample. In addition, it has been observed that this word appeared prominently in the pre-processed image in the first instance wordcloud and in the highlighted words of the metadata file in the second PLN phase executed. From this selection, contextual relationships were observed with words such as "apps" (software specific to mobile devices), the root word "medic" which can be derived from medicine, doctor or medication, or the word "mal".

This can be seen in the inspector panel where a set of neighbouring or close data is detailed based on the fact that all these vectors are located in the same space according to cosine similarity.

After several tests on different vectors, the work focused on analysing the eigenvectors of the communication, to try to see what results it gave us and if we could confirm the objective of the research. In other words, to understand whether the use of this type of technology in psychotic patients provided us with differential results in the interviews carried out beyond the information we already had and the qualitative interpretations.

RESULTS

PRE-PROCESSING OF TRANSCRIBED TEXTS

After applying the working method with the Python software, the results of the pre-processing of the transcribed document from the previous qualitative research phase were obtained. Once the first cell was executed in Jupyter Notebook to import the text and start the PLN with Python code and scripts, the first result was obtained and it was possible to move on to the next cell to advance in the processing of the language of the transcribed texts (see example Figure 1).

https://s3-us-west-2.amazonaws.com/typeset-prod-media-server/2312581d-e49a-4fdc-9a9a-b3e0a10a7190image7.png — **Figure 1: First cell of the text import job for processing in PLN.**

Source: Own elaboration from importing the text into the software.

From this step, the results of the pre-processing of the transcribed document from the previous qualitative research phase are observed.

TOKENISATION, EMPTY WORD REMOVAL AND WORD ROOT EXTRACTION

The next step consisted of tokenisation, elimination of empty words and extraction of word roots, obtaining, after the execution of the new cell, a word result from the imported and processed text of the patients in the focus group without the app (see example results in Figure 2), which we then repeated for the other focus group of patients who did use the mobile application.

https://s3-us-west-2.amazonaws.com/typeset-prod-media-server/2312581d-e49a-4fdc-9a9a-b3e0a10a7190image8.png — **Figure 2: Tokenisation, stemming and stopwords.**

Source: Own elaboration based on the execution of the code in Python..

WORD FREQUENCY STUDY

After implementing the code on our tokenised text of the first group of patients without an app, we worked on the identification of words and the most repeated words in the text analysed were displayed visually above the rest of the data, obtaining the following result by frequency of appearance: "application", "well", "can", "medical", "I think", "mobile", "medication", "can", "best" and "do".

https://s3-us-west-2.amazonaws.com/typeset-prod-media-server/2312581d-e49a-4fdc-9a9a-b3e0a10a7190image5.jpg — **Figure 3: Result of the keyword cloud in the sample of patients without an app.**

Source: Own elaboration from running the Python code.

After applying the same methodology to the transcribed text of patients who did use the app, the result obtained after creating, generating and displaying the wordcloud code resulted in an image with different data. In this case, the words highlighted by their frequency, after tokenisation, resulted in the following pre-processed words: "app", "clear", "I think", "thing", "question", "same", "also", "good", "medication". In the two cases investigated we can see how three key words coincide in the result: "application", "well" and "medication".

https://s3-us-west-2.amazonaws.com/typeset-prod-media-server/2312581d-e49a-4fdc-9a9a-b3e0a10a7190image3.jpg — **Figure 4: Result of the keyword cloud in the sample of patients with an app.**

Source: Own elaboration from running the Python code.

LEMMATISATION AND ROOT WORD EXTRACTION

The result of the research phase to identify and display the root words showed a list of metadata after running the software. This list reflected a number of words for each text analysed where it is comparatively observed that the metadata lemmatised "medic" and "aplic" are in the top 3 of the keywords. In addition, the metadata "good" and "better" appear in the top 10 for patients without an app, while "better" also appears in the top 10 metadata for patients with an app.

Table 1: Extract from the list of metadata generated by applying the software to count word frequency in patients without an app and with an app.

Metadata Patients without app		Metadata Patients with app
Word	Count	Word	Count
Si	103	si	185
medic	47	pas	45
aplic	45	aplic	44
hac	44	medic	44
cre	38	cre	40
pued	36	clar	40
com	32	mejor	35
bien	32	pregun	32
mal	26	haz	32
mejor	25	pued	30
movil	23	contest	26
inform	21	igual	24
utiliz	21	bien	23

Source: Own elaboration from running the Python code.

EMBEDDINGS DISPLAY

After loading the files generated in the embedding projector software for the visualisation of the embedded data in the following phase, the result of investigating the sum of vector data is shown for the roots of the keywords of the top 25 neighbouring or nearby data: communication, information or misinformation, usefulness, relationship and sensation, in order to observe the associations and feelings generated in these cases, both for patients who did not use the mobile health application and those who did have it installed on their smartphone devices (see example in Figure 5).

https://s3-us-west-2.amazonaws.com/typeset-prod-media-server/2312581d-e49a-4fdc-9a9a-b3e0a10a7190image2.png — **Figure 5: Visual result of the "mobile" vector in the sample of patients without an app.**

Source: Own elaboration from running Word2vec.

RESULTADOS DE VECTORES EN PRIMER GRUPO FOCAL DE PACIENTES SIN APP

In order to see if the final objective of the research is met, we focused on analysing specific vectors, centred on the following words: communication, information or misinformation, usefulness, relationship and feeling. This analysis sample was selected to identify the data associations with them.

Below, we show the result of the analysis of the analysed vector "Inform" in patients without a mobile health application installed: "habl", "frecuent", "preguntar", "posit", "much", "absurd", "aplic", "detect".

For the vector "common" in patients without a mobile health application installed, the result was: "doctors", "technology", "hospital", "awareness", "doubt", "favour", "inconvenient", "count", "appreciate".

As for the analysis of the vector "disinform" in patients without a mobile health application installed: "disease", "import", "diagnose", "value", "doctor", "health", "stabiliz", "recaig", "person", "better".

The result of the vector "feeling" in the first sample was: "particul", "sint", "absurd", "selection", "ver", "clinic", "distinta", "difícil", "puntual", "mejor".

Regarding the analysis of the vectors generated in the first group of interviews, we analysed the vector "util" with the following result: "enfermedad", "seguimient", "impor "t, "afront", "sencill", "app", "concienci", "relacion", "privac".

And finally, for the vector "relationship" in patients without a mobile app, we obtained the following data: "detection", "doctor", "none", "do", "correspond", "help", "empiez", "util", "none".

RESULTS OF VECTORS IN SECOND FOCUS GROUP OF APP PATIENTS

For the second text of the focus group of patients who had the health app installed on their mobile device, we repeated the same set of vectors to be analysed. The result obtained, in this case, from the analysis of the vector "inform" in patients who had the mobile health app installed was the following: "better", "same", "ask", "anot", "activ", "common", "check", "exercise", "possible", "great".

For the vector "common" in patients who had the mobile health application installed, the neighbouring vectors were: "medic", "inform", "preguntart", "habit", "bidirectional", "secur", "contest", "coment", "normal", "eriquec" (See example Figure 6).

https://s3-us-west-2.amazonaws.com/typeset-prod-media-server/2312581d-e49a-4fdc-9a9a-b3e0a10a7190image6.png — **Figure 6: Result of neighbouring vectors for the "common" point, tokenised root of the keyword communication.**

Source: Own elaboration from running Word2vec.

As for the analysis of the vector "sensation" in patients who had the mobile health application installed, the softawre showed the following connected data: "sintom", "psiquiatr", "nuev", "bidireccional", "comod", "bien", "util", "ocasion", "alarm".

For the vector "util" in patients who had the mobile health application installed: "buen", "sup", "apoy", "comprend", "psicot", "diari", "encuentr", "cuest", "dorm", "coñaz", "quej", "recordatori", "cambi".

And in this group, the analysis of the vector "relation" showed the following processed data: "contig", "tratamient", "notif", "medic", "complet", "utiliz", "resolv", "demuestr", "estres", "descans".

RESULTS OF THE TWO APPLIED TECHNIQUES ON QUALITATIVE RESEARCH METHODOLOGIES

With the set of data and results obtained, we drew up a comparison between the results of the previous qualitative texts with traditional methodology and the new ones once the PLN techniques were applied.

The comparative table of results (Table 2) shows us that data that reproduce the conclusions of the first research carried out with traditional methodology are replicated, but that the new methodology provides nuances that were lost without the application of PLN.

Results were obtained after the application of these AI techniques that confirmed the results of the categories analysed in the traditional methodology in a more subjective way. In addition, new results were extracted that were not previously identified in the previous study.

With regard to the category analysed, whether new technologies help to improve doctor-patient communication, we started from a yes in 75% of the cases analysed and after the application of PLN techniques, the result of neighbouring vectors "inform", "great", "better", among others, confirms this initial affirmation. New connected information vectors such as "habit", "igual", "ejerc" also appear.

As for whether new technologies applied to health are useful, which we started with a positive affirmation in 75% of cases, the results again confirm the PLN with neighbouring vectors such as "good", "support" or "buy". New nuances also appear with the result of vectors: "psicot", "diari" and "dorm".

Finally, the result for the category of if communication is better, the patient feels better, which started from the classical analysis with a 100% affirmation, is confirmed after the PLN with vectors obtained such as: "secure", "useful", "good" and new nuances such as "alarm" or "psychiatr".

Table 2: Comparative results of the two applied techniques on qualitative research methodologies

Category	Subjective previous conclusions	Conclusions after IA application: confirming neighbouring vectors	Conclusions after AI application: neighbouring vectors providing new results
New technologies help to improve doctor-patient communication	75% yes	“inform”, “enriquec”, “mejor”, “contest”, “genial”, “bidireccional”	“habit”, “igual”, “ejerc”
New technologies applied to health care are useful	75% yes	“buen”, “apoy”, “comprend”, “cambi”	“psicot”, “diari”, “dorm”, “coñaz”
If communication is better the patient feels better	100% yes	“segur”, “util”, “bien”, “comod”	“alarm”, “sintom”, “psiquiatr”

Source: Own elaboration

CONCLUSIONS

After applying PLN techniques to the data texts previously analysed without AI, we can conclude that connections are produced that confirm part of the conclusions obtained with the traditional method and that the new methodology provides nuances that were lost without the application of PLN, in addition to the fact that PLN automates the process for dealing with larger databases.

As a result of this research, it is possible to observe similarities and differences in the results of the new methodology on qualitative research work in the field of communication and health, which provides us with new conclusions. This approach to the application of AI techniques in patients with mental illness indicates that it is a path with great potential and that it continues to grow at an exponential rate in all areas of society.

In this particular case, when comparing whether new technologies help to improve communication between doctor and patient, it has been observed that by applying PLN, words such as "great" or "bidirectional" are connected, something that links, due to the meaning of these words, with the previous conclusions that they really did in 75% of cases.

The same occurs with the analysis of the category of whether better communication makes the patient feel better, and where in the traditional study it was concluded that this was the case in all cases, when studying the connections between words, it has also been observed that it validates these results with associated words such as "secure", "useful" and "good". In addition, there are other connections to take into account, such as "sintom" or "alarm", which are difficult to interpret without a context.

AI learns, automates and links neighbouring words semantically in different ways, so we apply a method that without algorithmic learning can escape us in the previous analysis of a qualitative research to draw new conclusions from the work, known as data-driven analysis (Rodriguez, Sivic, Laptev, & Audibert, 2011).

With this study, we confirm that AI is useful for application in biomedical areas and more specifically in the field of mental health and communication. We conclude, therefore, that one of the possible applications of AI in psychiatry is PLN, obtained in this case through focal interviews, and that AI allows the computer world and algorithms to understand, interpret and manipulate human language by automating it. We observe how PLN is positioned as a valid technique within AI for discourse analysis, as previously demonstrated.

In any case, it is detected in this research that there are several limitations as software processing itself restricts natural language and that, although subjectivity is isolated, in qualitative texts we consider that it is necessary to work with mixed techniques to enrich qualitative research methodologies (Guetterman et al., 2018). Language is full of words with different meanings and nuances, which, depending on the context, can acquire different meanings, and can even vary according to the communicator's own intention when expressing himself, and all this is something that is not being solved at the moment with this type of technique.

Furthermore, in the field of mental health, prosody (the emotional state of the speaker) is a determining factor in understanding how the patient is feeling, and in this case, we see that, although PLN helps us to automate and connect words, it is not yet self-sufficient at present in detecting the expressions of emotion associated with words. Another limitation detected was specifically in the questions asked by the medical specialists or in the attitude of the patients in the interview sessions carried out since, according to the results, the word "yes" appeared on the greatest number of occasions due to the fact that the patients simply affirmed and nodded what they were asked without being able to extract more words to be analysed. In this way, it could also be concluded that it is necessary to increase the amount of texts and data in order to be able to draw more reliable and deeper conclusions.

In short, PLN is already a reality, an open field full of opportunities, but it must continue to advance in the immediate future in order to add value to qualitative research and to the texts it is applied to, as long as there is enough material to be analysed.

In conclusion, in order to understand the true meaning of a text, we are faced with an indisputable challenge and challenge in which algorithms are beginning to provide us with information and data of interest, but which still needs to continue to evolve. All this so that, in a complementary way to the subjective interpretations innate to language, they can give us more decisive results in the research we carry out. The most promising way forward for PLN is to understand words processed in specific contexts and domains and to be able to extract a unified meaning from them.

REFERENCES

[1] He, J, Baxter, S L, Xu, J, Xu, J, Zhou, X & Zhang, K . 2019. The practical implementation of artificial intelligence technologies in medicine. Nature medicine 25(1):30–36.

[2] Aghion, P, Jones, B F & Jones, C I . 2017. Artificial intelligence and economic growth (No. w23928) National Bureau of Economic Research .

[3] Taylor, S J & Bogdan, R . 1987. Introducción a los métodos cualitativos de investigación 1.

[4] Lupiáñez-Villanueva, F . 2011. Salud e internet: más allá de la calidad de la información. Revista española de cardiología 64(10):849–850.

[5] Calero, J L . 2000. Investigación cualitativa y cuantitativa. Problemas no resueltos en los debates actuales. Rev. Cubana Endocrinol 11(3):192–200.

[6] https://bit.ly/3bQTdCG

[7] Morgan, D L, Krueger, R A & Scannell, A U . 1998. Planning focus groups. Sage. Disponible en .

[8] Rong, G, Mendez, A, Assi, E B, Zhao, B & Sawan, M . 2020. Artificial Intelligence in Healthcare: Review and Prediction Case Studies. https://bit.ly/30mAgEO

[9] Guetterman, T C, Chang, T, Dejonckheere, M, Basu, T, Scruggs, E & Vydiswaran, V . 2018. Augmenting Qualitative Text Analysis with Natural Language Processing: Methodological Study. Journal of medical Internet research 20(6).

[10] Rodriguez, M, Sivic, J, Laptev, I & Audibert, J Y . 2011. Data-driven crowd analysis in videos. 2011 International Conference on Computer Vision 1235–1242.

[11] Bertoldi, S, Fiorito, M E & Álvarez, M . 2006. Grupo Focal y Desarrollo local: aportes para una articulación teórico-metodológica. Ciencia, docencia y tecnología. 17:111–131 https://bit.ly/2yRjRwr

[12] Webster, J J & Kit, C . 1992. Tokenization as the initial phase in NLP. The 15th International Conference on Computational Linguistics 4.

[13] Powell, R A & Single, H M . 1996. Focus groups. International journal for quality in health care. 8:499–504 https://bit.ly/3aKAWFt

[14] Liddy, E D . 2001. Natural language processing. https://bit.ly/2zHhpJp

[15] Gibbs, A . 1997. Focus groups. Social research update 19(8):1–8.

[16] Perrault, R, Shoham, Y, Brynjolfsson, E, Clark, J, Etchemendy, J, Grosz, B, Lyons, T, Manyika, T, Mishra, S & Niebles, J C . 2019. The AI Index. Annual Report .

[17] Hirschberg, J & Manning, C D . 2015. Advances in natural language processing. Science 349(6245):261–266.

[18] Brunn, M, Diefenbacher, A, Courtet, P & Genieys, W . 2020. The Future is Knocking: How Artificial Intelligence Will Fundamentally Change Psychiatry. Academic Psychiatry, Online. Disponible en .

[19] Abadi, M, Agarwal, A, Barham, P, Brevdo, E, Chen, Z, Citro, C, Ghemawat, . . & S . 2016. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. https://bit.ly/3bQSEZm

[20] Bustos, A, Pertusa, A, Salinas, J M & Iglesia-Vayá, M . 2019. Padchest: A large chest x-ray image dataset with multi-label annotated reports. https://bit.ly/2KLyjbX

[21] Steckler, A, Mcleroy, K R, Goodman, R M, Bird, S T & Mccormick, L . 1992. Toward Integrating Qualitative and Quantitative Methods: An Introduction. Health Education Quarterly 19(1):1–8.

[22] Turney, L & Pocknee, C . 2005. Virtual focus groups: New frontiers in research. International Journal of Qualitative Methods 4(2):32–43.

[23] Loper, E & Bird, S . 2002. NLTK: the natural language toolkit. https://bit.ly/2VJV1Yi

[24] Mikolov, T, Sutskever, I, Chen, K, Corrado, G S & Dean, J . 2013. Distributed representations of words and phrases and their compositionality. Advances in neural information processing systems 3111–3119.