Such drift in suicide risk level can be attributed to personal factors e. The time-variant changes in these factors make the task of diagnosis for MHPs more challenging [ 15 ]. Even though Electronic Health Records EHRs are longitudinal, studies have predominantly relied on time-invariant modeling of content to predict suicide-related ideations, suicide-related behaviors, and suicide attempt [ 16 , 17 ].

One practical approach is to monitor alternative sources, such as Reddit posts, over a specified time period to detect time-variant language drifts for signals of suicide-related ideations and behaviors. Complementary to the above effort, suicide-related behavioral assessment using time-variant modeling over social media platforms has been promising [ 20 ]. Time-variant modeling involves extracting suicide risk-related information independently from a sequence of posts made by a user.

Such an approach allows you to capture the explainable patterns in suicide-related ideations, suicide-related behaviors, and suicide attempts, similar to the process of MHPs identifying these risk levels.

Of these platforms, patients reported that Reddit was the most beneficial option in helping them cope with mental health disorders because of the pre-categorized mental health-related subreddits that provide an effective support structure.

On per month average, around 1. The analysis of Reddit content is demanding due to a number of reasons, including interaction context, language variation, and the technical determination of clinical relevance.

Correspondingly, potential rewards of greater insight into mental illness are in general and suicidal thoughts and behavior specifically is great. These broader observations of challenges translate into three aspects of modeling and analysis: 1 Determination of User-Types , 2 Determination of Content-Types , and 3 Clinical grounding.

Content-Types on MH-Reddits capture 1 ambiguous and 2 transient postings made by different users in User-types. A recent survey on suicide risk in the context of social media suggests that existing studies on the topic have ignored clinical guidelines in the risk modeling framework and have placed an over-reliance on statistical natural language processing [ 31 ].

With the exceptions of recent studies by Gaur et al. Although these objective markers can screen patients, their utility to MHPs has not been validated. Li et al. The authors identified a substantial correlation between linguistic cues e. Further, different user types and content types have varied influences on MHCI, which is different from the influence of linguistic cues. Our experiment with C-SSRS-based labeled dataset revealed that sentiment Fig 1 left and emotion Fig 1 right factors did not satisfactorily discriminate different suicide risk severity levels [ 13 ].

The limitations associated with the annotation process, inter-rater agreement, and clinical translation of prior research have been identified as significant concerns [ 40 , 41 ]. A more substantial concern is that despite using state-of-the-art algorithms e. Various syntactic, lexical, [ 20 , 43 ] psycholinguistic features using linguistic inquiry and word count LIWC , [ 44 ] key phrases, [ 45 ] topic modeling, [ 46 , 47 ] and a bag of word models [ 48 ] or other data-driven features have been explored for modeling the language used in online mental health conversations.

Our study aims to fill these gaps of prior studies by taking into account User Types , Content Types , and an ubiquitous clinical screening tool , C-SSRS, to assess suicide risk severity in a time-variant and time-invariant manner [ 52 ].

Through the support of practicing psychiatrists in performing annotation and achieving substantial inter-rater agreement, we strengthen the clinical relevance of our research and inferences derived from outcomes [ 53 ].

While some mental health patients are fortunate enough to see multiple members of a care team on a frequent weekly at best basis, and most are not seen for months at a time. Any effort to shift this load to human labor would be prohibitively expensive.

Our study can also be used to build tools that quantify suicide risk based on mental health records e. We outline our approach in Fig 2. We make the following key contributions in this research: a We create a new Reddit dataset of users with both user-level and post-level annotations following C-SSRS guidelines.

Considering the human behavior in their online written conversation, we followed a Zipf-Mandelbrot law and negation resolution method to computationally identify potential suicidal users For a detailed description of this method, we refer the reader to Gaur et al. Our selection of resources is governed by the structure of the New York City CDRN warehouse, which provides treatment information, condition description, drug exposure, and observation on mental health conditions [ 58 ].

This helped us construct a semantics preserving and clinically grounded pipeline that addresses significant concerns overlooked by previous studies on MH systems [ 5 , 64 ]. In current research we utilize users removing 52 suicide indication users with post-level and user-level annotations.

For example, consider the following three posts: P1 I am sick of loss and need a way out ; P2 No way out, I am tired of my losses ; P3 Losses, losses, I want to die. The phrases in P1 and P2 are predictors of suicidal tendencies but are expressed differently, while P3 is explicit [ 65 ]. The procedure requires computing the semantic proximity between n-gram phrases and concepts in medical lexicons, which takes into account both syntax and contextual use.

Among the different measures for semantic proximity, we utilize cosine similarity measures. The vector representations of the concepts and n-gram phrases are generated using the ConceptNet embedding model [ 66 , 67 ]. We employ TwADR, and AskaPatient lexicons for normalizing the informal social media communication to their equivalent medical terms [ 68 ]. These lexicons are created using drug-related medical knowledge sources and further enriched by mapping twitter phrases to these concepts using convolutional neural networks CNNs over millions of tweets [ 59 ].

Thus we obtain syntactically similar normal forms when we have two semantically equivalent posts. To perform MedNorm, we generated a word vector for each word in the post using ConceptNet. If the cosine similarity is above an empirical threshold of 0. After aggregating their content, we performed MedNorm using Lexicons to generate clinically abstracted content for effective assessment.

Privacy or Ethical Concerns : Our current study performs analysis of community-level posts to examine the utility of social media e. The study does not involve human subjects e. The datasets, to be made public, do not include the screen names of users on Reddit, and strictly abide by the privacy principles of Reddit platform. Example posts provided in this article have been modified for anonymity. Existing studies demonstrate the potential for detecting symptoms of depression by identifying word patterns and topical variations using the PHQ-9 questionnaire, a widely used depression screening tool [ 27 , 69 — 71 ].

Though these categories are well suited for clinical settings, their utilization for user assessment of online social media data is not straightforward. This shows a variation in behaviors exhibited by a user. While the C-SSRS categories are clinically defined, there is a need for a lexicon with social media phrases and clinical knowledge that quantifies the relevance of a post to suicidal categories.

These medical knowledge resources are stored in graphical structures with nodes representing medical concepts and linked through medical relationships e. Hence, using these two knowledge sources that contain semantically relevant domain relationships, one can create a mental health knowledge structure for making inferences on suicidal severity.

Following this intuition, the suicide risk severity lexicon for semantically annotating the Reddit dataset was created by Gaur et al. For this research, we removed users labeled with suicide indication, giving a total of users with labels: Supportive, Suicide Ideation, Suicide Behaviors, and Suicide Attempt see Table 1 for statistics on annotated data. Suicide Indication IN category separates users using at-risk language from those actively experiencing general or acute symptoms.

Users might express a history of divorce , chronic illness , death in the family , or suicide of a loved one , which are risk indicators on the C-SSRS, but would do so relating in empathy to users who expressed ideation or behavior, rather than expressing a personal desire for self-harm.

In this case, it was deemed appropriate to flag such users as IN because while they expressed known risk factors that could be monitored they would also count as false positives if they were accepted as individuals experiencing active ideation or behavior.

The users labeled as suicide indication in Reddit user dataset were removed because of high disagreement between annotators during post-level annotation. Four practicing psychiatrists have annotated the dataset with a substantial pairwise inter-rater agreement of 0. The created dataset allows Time-invariant suicide risk assessment of an individual on Reddit, ignoring time-based ordering of posts.

For Time-Variant suicide risk assessment, the posts needed to be ordered concerning time and be independently annotated. Following the annotation process highlighted in Gaur et al. The annotated dataset of users comprises supportive throwaway account: , Non-throwaway account: and uninformative throwaway account: , Non-throwaway account: posts.

For throwaway accounts, the dataset had 37 supportive users S , 63 users with suicide ideation I , 23 users with suicide behavior B , and 17 users had past experience with suicide attempt A. User distribution within non-throwaway accounts is as follows: 85 S users, I users, 76 B users, and 33 A users.

A,B,C,and D are mental healthcare providers as annotators. We explain two competing methodologies: TvarM and TinvM, for suicide risk severity prediction. Prior literature has shown the effectiveness of sequential models e. Moreover, it has been investigated through experimentation that sentences formed by an individual express their mental state. Hence, these inherent textual features linguistic use of nouns, pronouns, etc.

Motivated by prior findings suggests that LSTM selectively filters irrelevant information while maintaining temporal relations; we incorporated them for our Time-variant framework [ 76 , 77 ]. It is messed up. I dont even go to the exams, but I tell my parents that this time I might pass those exams and will be able to graduate.

And parents get super excited and proud of me. It is like Im playing some kind of a Illness joke on my poor family. LSTMs learn a representation of a sequence. Our LSTM model predicts the likelihood of each suicidal severity category of a Reddit post, taking into account its sequence of words. However, the representation of a post is learned independently; hence patterns across multiple posts are not recognized.

We require a model which engineers features across multiple posts from a user. Convolutional neural networks CNNs are state of the art for such tasks [ 79 ]. It comprises of an LSTM model to generate probabilities of a post p 0 i , which is a sequence of word embeddings.

Considers learning over all the posts made by a user to provide a user-level suicidality prediction. For this methodology, we put together all the posts made by the user irrespective of time in SuicideWatch and other mental-health related subreddits. TinvM possesses the capability to learn rich and complex feature representation of the sentences utilizing a deep CNN.

Our implementation of CNN is well described in Gaur et al. The model takes embeddings of user posts as input and classifies them into one of the suicide risk severity levels. We concatenate embeddings of posts for each user and pass them into the model. Evaluations are performed using the formulations described by Gaur et al.

The italicized text are phrases which contributed to the representation of the post. These phrases had similarity to the concepts in medical knowledge bases.

The italicized text are phrases which contributed to the representation of each post. We present an evaluation of the two methodologies: TinvM and TvarM, in a cross-validation framework using data from users. We then obtained key insights into throwaway accounts, supportive posts, and uninformative posts. Through an ablation study using different user-types and content-types, we compare TinvM and TvarM models in the user-level suicide risk severity prediction. We began our ablation studies with the TinvM setting, as shown in Table 5a.

As can be seen from the table, experiment S1, which includes throwaway accounts, uninformative posts, and supportive posts, achieved the best performance. The modest improvement in precision and recall in suicidality prediction of throwaway accounts is because of verbosity in content compared to non-throwaway accounts.

While throwaway accounts have largely been ignored in the previous studies, we noticed useful information on suicidality in their content see Table 6 [ 82 ]. We hypothesize that this is because users are more open to express their emotions and feelings when they can remain anonymous. In another ablation study of TvarM for predicting the suicidality of throwaway accounts, we note a significant decline in false negatives compared to TinvM.

We found supportive posts to be more important in determining user-level suicidality S11 in Table 5c compared to uninformative posts. This is because contents from a supportive user include past suicidal experiences, which could be higher in suicide severity, causing the TinvM model to predict false positives. The dense content structure of throwaway accounts at each time step improved the averaged recall in experiment S9 TvarM compared to S1 TinvM.

Thus, the time-variant modeling is akin to a hypothetical bi-weekly diagnostic interview between a patient and a clinician conducted in a clinical setting. The reduction in false positives and false negatives is due to sequence-preserving representations of time-ordered content, capturing local information about suicidality, and keeping important characteristic features across multiple posts through a max-pooled CNN.

In the TinvM context, irrespective of user-type, all types of content are required for high precision and high recall in predicting user-level suicidality.

Lengthy posts expressing mental health conditions are often made by TA a , which resulted in high precision compared to Non-TA b.

However, in the TvarM, seldom supportive behavior of suicidal users is important for assessing their suicidality c. For Non-TA, there is a trade-off between precision and recall concerning uninformative posts.

If you are on interstates during rush hour (morning or afternoon), then yeah, but that wouldn’t be a problem for you if you lived downtown or in West Columbia. I think Columbia is a great place . Oct 02,  · Columbia is nice place to stay. There is always something to do around the town. Some of the schools can be a bit rough. Current Resident 4 months ago Overall Experience Missing: reddit. South Carolina Is Cheap and Columbia Is Even Cheaper! Overall, the cost of living in South Carolina is lower than the average cost of living in the United States. On top of that, Missing: reddit.


