• OK, it's on.
  • Please note that many, many Email Addresses used for spam, are not accepted at registration. Select a respectable Free email.
  • Done now. Domine miserere nobis.

For intpf miner participants

BurnedOut

Well-Known Member
Local time
Today 10:00 AM
Joined
Apr 19, 2016
Messages
767
-->
Will keep updating as I continue working on this.

Also, I am sorry to inform that I will invariably accessing a lot of users posts for the sake of statistics. However, they will not be identified at all anywhere. Therefore, all their data will not be misused.

Secondly, I will upload all my data after each mining session on github.com (link is still not up) and purge everything that I have on my PC. I hope everybody has their faith in me when I say that I am not doing this to sell anything to Google or Advertisers.

11/3/2021

Current plans: Perform a very very basic analysis on the posts and measure some big 5 metrics: neuroticism, openness, extraversion. Acc. to research, this is what is possible for now. Gauging agreeableness will be a tricky thing of sorts. However, I will work on that later.

Also, if you are able to, while providing your ideas, kindly label them with
  1. T: Thread level ideas: eg. Threads is which everybody is fighting/agreeing/socializing/theorizing. This will be focused on the thread as a whole ; Evolution of complexity of threads over time.
  2. U: User level ideas: eg How likely is BurnedOut to post a sad/happy/analytical thread ; XYZ trait can be measured by perusing MNO aspects of posts
  3. I: Ideas based on interaction between two or more users. eg: How likely is BurnedOut to cause dicussions/debates/arguments in the thread (notwithstanding that whether they are towards him or others. The sentiment is gauged here) ; How like are AK and BurnedOut to interact in theads in general?
  4. O: Other ideas that do not fit in the above category: General theorizing, idea sharing, speculations that will HELP the analyzer.
  5. S: Related to statistics: Methods of regressing, formulae, frequency analysis, etc.

Finding time is extremely difficult for me. I am going to major pretty soon, so caught up in that storm.
 

BurnedOut

Well-Known Member
Local time
Today 10:00 AM
Joined
Apr 19, 2016
Messages
767
-->
TGM, osts, mine and AK's general analysis is going to be posted soon. I need to do some comparisons and valid stat comparisons to draw verifiable results. However, I will post basic stuff such as most-mentioned words, etc
 

BurnedOut

Well-Known Member
Local time
Today 10:00 AM
Joined
Apr 19, 2016
Messages
767
-->
Dimensions used for comparisons:
  1. CC: Conjunctions
  2. IN : Prepositions + Subordinating conjunctions
  3. PRP: Personal pronouns + possessive pronouns
  4. DT : Determiners JJR : Comparative Adjectives
  5. NN : Nouns RB : Adverbs
  6. RBR : Comparative adverbs
  7. VBD: Verb past tense
  8. VBP: Verb present tense
  9. Usage of I,me,my
  10. Usage of we,us,our
  11. Usage of you


Everything is expressed in terms of % of total text harvested.



Total word count used for each participants:
  1. BurnedOut: 65096
  2. Animekitty: 163096
  3. onesteptwostep: 16956
  4. EndogenousRebel: 80282
  5. The Grey Man: 40250
The discrepancy in WC is a result of uneven posting frequency but I believe that this is a good enough sample for analysis.



Psychometric correlations according to research
  1. Nouns, Adjectives average word size: Verbal intelligence
  2. Conjunctions: Analytical-ness
  3. Past tense verbs: Formality and usually found in analytical texts
  4. Present tense forms: Narrative style of thinking, less complex
  5. Usage of I:
    1. If higher then reflects self-consciousness
    2. If lower then formality, emotional distancing and display of power
  6. Usage of you:
    1. Measure of dominance and aggressiveness
  7. Usage of we, our, us
    1. Dominance
    2. Social cohesiveness
  8. Comparative adjectives and adverbs: With analyticality of the text
  9. Lots of pronoun usage: Social post, less likely to be analytical and more expressive.
Obviously, this is very basic. It is more complex than this but I will update the posts according to time and growing abilities of data mining.



Means: Represent the general trend
Variance: Represents the stability of trends. High variance: high frequency of emotional upheavels.
 

BurnedOut

Well-Known Member
Local time
Today 10:00 AM
Joined
Apr 19, 2016
Messages
767
-->

Means​


1616830687084.png

1616830874288.png

1616830911160.png

1616830919464.png

1616831011360.png


1616831214149.png
 

BurnedOut

Well-Known Member
Local time
Today 10:00 AM
Joined
Apr 19, 2016
Messages
767
-->

Variances​



1616831290242.png
1616831275632.png

1616831320182.png

1616831327567.png
1616831282894.png
1616831413426.png
 

BurnedOut

Well-Known Member
Local time
Today 10:00 AM
Joined
Apr 19, 2016
Messages
767
-->
Means


Who is the most analytical?

In order to understand this, we are going to consider these parameters:
DT, CC, IN, JJR, RB and RBR.
  1. JJR and RBR are related to comparisons. More comparisons = more distinctions. Therefore, more nuances = more analyticality.
  2. CC, IN point to the complexity of texts as more = more predicates, complicated assertions, etc
  3. DT points out to objectivity. More usage of a, an and the is indicative of specifying, discriminating and talking about objects and classifications
  4. RB provide more depth and more nuances to meanings. Therefore, overall greater adverb usage points to imparting more depth.
The most determining factor is the usage of CC which is directly linked to the amount of complexity. Less usage of CC points to simpler sentences but simpler sentences are easy to comprehend and concoct because the author does not wish to clog the working memory. However, if CC usage is high and all other factors are low, the author is still imbibing complexity because keeping track of propositions across each successive conjunctions is heavy on the working memory and processing. Therefore, a higher capacity to chain sentences and continue chain of reasoning is a sure sign of a higher analytical ability.

But the other factors are important too because the content and diversity of material is also reflective of arguments and propositions considered during analysis. For eg. usage of lots of DT usually entails minutely categorizing everything in the text. TGM is extremely good at that and it is rightfully reflected so.

We can hence make some deductions about the real life behaviour of the contenders. Below mentionings are in the descending order of possible analytical capability:

BurnedOut is probably the most analytical due to his high usage of JJR, RBR, RB and CC and IN. In real life, he is likely to be very discerning and distinguishing in his thinking with complex streams of thoughts in the head. However, BO is likely to be more focused on particular things than TGM and OSTS who consider a slightly broader perspective and cover more topics. (For MBTI suckers: strong Ti and Ne)

EndogenousRebel comes second with a similar pattern as BurnedOut. BO and ER may likely click in real life because they both tend to have a nuanced and complex way of thinking. Also, personality wise, this aspect of their matches. (For MBTI suckers: strong Ti and Ne)

OSTS is the most balanced of all. Not only OSTS is naturally adept at complex thinking, he is also good at categorizing information but not as good as TGM. TGM is able to categorize and be extremely objective among the participants. OSTS also covers a wide range of topics and tends to not engage in too many comparisons. He does not indulge in nuances as much as BO and ER. However, his high adverb usage with low comparative adverbs may suggest that his adverbs are either pro-social or more focused on elaborating than critiquing the topics he is talking about. His posts may be more generalized and it may reflect his thoughts on it than actively concocting theories and new explanations. (For MBTI suckers: stronger Ne with fairly strong Ti)

AK comes third. AK's posts are fairly complex in comparison to the contenders (but more complex than TGM). He tends to categorize objects the least suggesting that either he talks a lot about emotions or he is less objective than all of us or both. However AK tends to mention more nuances than OSTS and TGM (JJR) however when it becomes to drawing higher-order nuances, OSTS is better than AK (RBR) (For MBTI suckers: stronger Ti with weak Ne)

TGM comes last. TGM is the taskmaster of categorizing information. He seems to be very specific in what he talks about and is anal about 'A being A and B being B'. However, TGM seems to not indulge in comparisons as much as his counterparts here. It looks like he rather likes discussing than disputing his views on things. (For MBTI suckers: very strong Ti with weak Ne)
 

EndogenousRebel

We're all trying our best. Aren't we?
Local time
Yesterday 11:30 PM
Joined
Jun 13, 2019
Messages
461
-->
Location
Narnia
Maybe see in a couple months if our behavior changes why don't you? (Haha) Obviously only looking at mine, there are a couple stories that would fit the data, though I only looked at it for little. People have said I am intimidating and I do plainly see that reflection in the data. My upbringing was fairly hostile. Perhaps break the data down into segments to see different patterns throughout time and see what other stories might appear.
 

BurnedOut

Well-Known Member
Local time
Today 10:00 AM
Joined
Apr 19, 2016
Messages
767
-->
I have put the project on hold for a while.

@EndogenousRebel I attempted a time-series analysis and it was my intention to post the findings here but the problem is that nobody here is so consistent. Not even monthly. It is difficult to further say if the fluctuations are mainly because of major life events or if there is indeed a baseline pattern. My results' variability is very high because of my unstable life in the past. However, it does not paint a realistic picture.

I should have used sum of squares technique instead of mean and variance. Outliers might have been better controlled which are distorting the real image.

My knowledge in NLP is increasing. However since I dropped math before graduation, I find it harder and harder to understand some advanced concepts. Therefore, maybe in the future, I may think of doing something concrete about this data.
 
Top Bottom