Notes

Notes - notes.io

It Is In Any Case Remarkable
Charts are used to measure relative success for a large variety of cultural items. Traditional music charts have been shown to follow self-organizing principles with regard to the distribution of item lifetimes, the on-chart residence times. Here we examine if this observation holds also for (a) music streaming charts (b) book best-seller lists and (c) for social network activity charts, such as Twitter hashtags and the number of comments Reddit postings receive. We find that charts based on the active production of items, like commenting, are more likely to be influenced by external factors, in particular by the 24 hour day-night cycle. External factors are less important for consumption-based charts (sales, downloads), which can be explained by a generic theory of decision-making. In this view, humans aim to optimize the information content of the internal representation of the outside world, which is logarithmically compressed. Further support for information maximization is argued to arise from the comparison of hourly, daily and weekly charts, which allow to gauge the importance of decision times with respect to the chart compilation period.

Isaac Asimov’s science fiction ‘Foundation’ trilogy is based on the presumption that it may be possible to develop a framework which would allow to model and predict the political and economic development of human societies as such (Elkins, 1976; Phillips & Zyglidopoulos, 1999). At the core of the framework in question, ‘psychohistory’, lies the hypothesis that it could be possible to describe collective decision-making by rigorous laws, regardless of what a specific individual decides to do. Individual preferences would average out when populations are large. On a decisively less galactic level, a legitimate question regards the statistics of cultural phenomena produced by large numbers of decisions. Prime candidates are in this regard rankings measuring the economic success of cultural goods, like music albums, books and movies. Here we will show that certain traits of the New York Times best-seller list and the streaming charts compiled on a daily basis by Spotify, a streaming portal, are congruent with the predictions of an information-theoretical theory of human decision-making.

It has been recognized that charts, in particular music charts, like the US-based Billboard charts, provide access to the study of an extended range of socio-cultural developments. Examples are the evolution of harmonic and timbral properties of popular music (Mauch et al., 2015), the influence of gender and race on chart success (Lafrance et al., 2018), and how technological progress influences cultural diversity, respectively concentration processes (Ordanini & Nunes, 2016). Other studies investigated the correlation between acoustic features (Interiano et al., 2018) and repetitive lyrics (Nunes et al., 2015) to market success, the interplay between popularity and significance in popular music (Monechi et al., 2017), and whether there is a trend towards a global musical culture (Achterberg et al., 2011). An alternative venue for the study of long-term societal developments are recognized scientific and cultural awards, like the Nobel price (Gros, 2018).

On an extended level, Schneider & Gros (2019) suggested that music charts are suitable proxies for testing quantitatively the concept of social acceleration (Rosa, 2013), viz the notion that both personal and societal processes proceed nowadays faster than they used to (Wajcman & Dodd, 2016; Vostal, 2017). The rationale for this suggestion is based on the observation that long-term databases of comparable entries are indispensable for studies aiming to measure the evolution of culturally and sociologically relevant time scales, a condition satisfied by charts, music and book charts alike, which provide timelines of five and more decades of weekly compiled data. For the US, UK, German and Dutch music charts it was found (Schneider & Gros, 2019), that the evolution of the album lifetimes and of the weekly rank decay provides evidence for a substantial acceleration, in particular since the rise of the internet. These findings are congruent with the changes observed for the chart diversity and for the dynamics of number-one hits.

Cultural items, like books and music albums, compete for resources, such as monetary budgets, attention and dedication. Competition for time and attention is also a driver for the dynamics of news topics on social networks, a process that has been argued to take place on accelerating time scales (Lorenz-Spreen et al., 2019). The success of social media generated content is closely connected to sharing, which is an aspect not present for classical cultural items. It has been observed in this context, that the statistics of sharing and scientific citations seem to follow identical universal functionalities (Néda et al., 2017).

An accessible example of a social platform is Reddit (Medvedev et al., 2017), a user generated exchange platform for information, opinions and comments. Studies of Reddit data examined (Baumgartner et al., 2020), among other things, the relation between popularity and intrinsic quality (Stoddard, 2015), the stability of online communities with substantial turnover rates (Panek et al., 2018), and the evolution of users interests (Valensise et al., 2019).

Taking the number of comments as a popularity measure, one can generate charts which parallel download and sales charts. One key difference is however that the exact timing of the triggering event is now of relevance. Download charts contain albums that have been released not only the same day, but also considerably in the past. New posts remain in contrast popular on Reddit only for a maximum of one or two days, which implies that the day-night activity cycle of the user could impact the lifetime distribution of Reddit posts. This is indeed what one finds. We argue that this observation implies that the feedback between popularity rankings and user activity interferes with additional time scale, like the 24 hour day-night activity cycle, when the timing of the user activity is strongly correlated with that of the triggering event.

Book and music sales and streaming charts are primarily consumption charts, for which an aggregate information-theoretical theory is proposed. Exactly when a consumption takes place, like the acquisition or the downloading of books and music, is not relevant, in contrast to the posting of Reddit comments, and other social media activities, which are time-relevant because they involve the creation of new items. As a consequence, we find when analyzing Reddit and Twitter comment and sharing charts, that additional driving forces, in particular the 24h day-night cycle, tend to mask the development of fully autonomous self-organization for charts of social media activities.

2 Information theory of human decision-making

On a statistical level, human decision-making may be modeled using an information theoretical ansatz (Gros et al., 2012). Starting from the same basic ansatz as Schneider & Gros (2019), we present here a modified derivation. From an epistemology view, we stress that our ansatz is intended as a possible explanatory framework. Similarly, the data analysis presented further below is not claimed to proof the correctness of the information theoretical ansatz, indicating however that it is a valid contender. From a generalized perspective our ansatz is neutral (Leroi et al., 2020), in the sense that is based not on semantic, but on statistical considerations.

2.1 Decision-making maximizing information

The brain processes and stores information. Given the finite size, information needs to be selected and compressed (Marois & Ivanoff, 2005). A suitable measure for the amount of information encoded by the probability distribution p(s)𝑝𝑠p(s)italic_p ( italic_s ) of a quantity s𝑠sitalic_s is given by the Shannon Entropy H[p]𝐻delimited-[]𝑝H[p]italic_H [ italic_p ],

H[p]=-⟨logp⟩,⟨A⟩=∫𝑑sA(s)p(s),formulae-sequence𝐻delimited-[]𝑝delimited-⟨⟩𝑝delimited-⟨⟩𝐴differential-d𝑠𝐴𝑠𝑝𝑠H[p]=-langlelog prangle,qquadquadlangle Arangle=int ds,A(s),p(s),,italic_H [ italic_p ] = - ⟨ roman_log italic_p ⟩ , ⟨ italic_A ⟩ = ∫ italic_d italic_s italic_A ( italic_s ) italic_p ( italic_s ) , (1)
where ⟨A⟩delimited-⟨⟩𝐴langle Arangle⟨ italic_A ⟩ denotes the expectation value of the function A=A(s)𝐴𝐴𝑠A=A(s)italic_A = italic_A ( italic_s ), see e.g. Gros (2015). We assume that the overall goal of human decisions is to maximize information, in particular the information generated by decision-making processes. As a proxy for the impact of actions, e.g. to buy a cultural item, we examine the statistical properties of the corresponding chart.

Statistically, information maximization is equivalent to maximizing the entropy H[p]𝐻delimited-[]𝑝H[p]italic_H [ italic_p ]. This goal can be achieved in practice only when respecting a given set of constraints. Real-world actions typically need to factor in the amount of effort and time involved, as well as the uncertainty of the outcome, viz the variance. On a statistical level these two constraints are given by the mean and the variance, which are equivalent to the first and the second moment of the probability distribution in question, ⟨s⟩delimited-⟨⟩𝑠langle srangle⟨ italic_s ⟩ and ⟨s2⟩delimited-⟨⟩superscript𝑠2langle s^2rangle⟨ italic_s start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⟩. The constrained maximum entropy distribution is consequently obtained by maximizing the objective function

Φ[p]=H[p]-a⟨s⟩-b⟨s2⟩,Φdelimited-[]𝑝𝐻delimited-[]𝑝𝑎delimited-⟨⟩𝑠𝑏delimited-⟨⟩superscript𝑠2Phi[p]=H[p]-alangle srangle-blangle s^2rangle,,roman_Φ [ italic_p ] = italic_H [ italic_p ] - italic_a ⟨ italic_s ⟩ - italic_b ⟨ italic_s start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⟩ , (2)
where a𝑎aitalic_a and b𝑏bitalic_b are suitable Lagrange multipliers. Using standard variational calculus (Gros, 2015), the distribution function maximizing Φ[p]Φdelimited-[]𝑝Phi[p]roman_Φ [ italic_p ] is found to be a Gaussian with mean μ=-a/(2b)𝜇𝑎2𝑏mu=-a/(2b)italic_μ = - italic_a / ( 2 italic_b ) and variance σ2=1/(2b)superscript𝜎212𝑏sigma^2=1/(2b)italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 1 / ( 2 italic_b ),

p(s)∼e-as-bs2∼e-(s-μ)2/(2σ2),similar-to𝑝𝑠superscripte𝑎𝑠𝑏superscript𝑠2similar-tosuperscriptesuperscript𝑠𝜇22superscript𝜎2p(s)simmathrme^-as-bs^2simmathrme^-(s-mu)^2/(2sigma^2),,italic_p ( italic_s ) ∼ roman_e start_POSTSUPERSCRIPT - italic_a italic_s - italic_b italic_s start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ∼ roman_e start_POSTSUPERSCRIPT - ( italic_s - italic_μ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / ( 2 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_POSTSUPERSCRIPT , (3)
where normalization factors have been suppressed.

2.2 Human brains compress information logarithmically

When making decisions, which is the distribution function p(s)𝑝𝑠p(s)italic_p ( italic_s ) that is to be maximized statistically using the entropy H[p]𝐻delimited-[]𝑝H[p]italic_H [ italic_p ]? For the case of book and music charts, a prime candidate is the lifetime distribution p(L)𝑝𝐿p(L)italic_p ( italic_L ), which measures the probability that a given item, a book or a music album, remains listed for a period L𝐿Litalic_L, the lifetime. While listed, books and albums receive increased attention, which implies that the listing period constitutes a visible effect of the individual decision to buy a specific cultural item.

From the cognitive perspective, the brain is confronted with two contrasting demands: to store incoming information as faithfully as possible, covering at the same time the extended orders of magnitude characterizing physical stimuli, like sound and light intensity, as well as time scales. An efficient solution for this conundrum is to store information on a compressed scale, for instance logarithmically. This is indeed the case, as expressed by the Weber-Fechner law, which states that the brain discounts sensory stimuli (Hecht, 1924), numbers (Nieder & Miller, 2003; Dehaene, 2003) and time logarithmically (Howard, 2018). It is not a coincidence that we use logarithmic scales, lumen and decibel, to measure light and sound intensities.

2.3 Maximizing compressed information

When performing an operation, like information maximization, the brain uses internal states that we denote here s𝑠sitalic_s. These states are in general related logarithmically to outside quantities, as discussed in the previous section. For the case of the lifetime distribution p(L)𝑝𝐿p(L)italic_p ( italic_L ) we have

s=log(L),p(s)ds=p(L)dL,formulae-sequence𝑠𝐿𝑝𝑠𝑑𝑠𝑝𝐿𝑑𝐿s=log(L),qquadquad p(s)ds=p(L)dL,,italic_s = roman_log ( italic_L ) , italic_p ( italic_s ) italic_d italic_s = italic_p ( italic_L ) italic_d italic_L , (4)
which relates the observable distribution function p(L)𝑝𝐿p(L)italic_p ( italic_L ) with its internal representation, the probability density p(s)𝑝𝑠p(s)italic_p ( italic_s ). Using ds/dL=1/L𝑑𝑠𝑑𝐿1𝐿ds/dL=1/Litalic_d italic_s / italic_d italic_L = 1 / italic_L and the maximum entropy probability density p(s)𝑝𝑠p(s)italic_p ( italic_s ), as given by (3), one finds

p(L)∼1Le-alog(L)-blog2(L)∼e-(log(L)-μ~)2/(2σ~2),similar-to𝑝𝐿1𝐿superscripte𝑎𝐿𝑏superscript2𝐿similar-tosuperscriptesuperscript𝐿~𝜇22superscript~𝜎2p(L)simfrac1Lmathrme^-alog(L)-blog^2(L)simmathrme^-(log(L% )-tildemu)^2/(2tildesigma^2),,italic_p ( italic_L ) ∼ divide start_ARG 1 end_ARG start_ARG italic_L end_ARG roman_e start_POSTSUPERSCRIPT - italic_a roman_log ( italic_L ) - italic_b roman_log start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT ∼ roman_e start_POSTSUPERSCRIPT - ( roman_log ( italic_L ) - over~ start_ARG italic_μ end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / ( 2 over~ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_POSTSUPERSCRIPT , (5)
with a log-mean μ~=-(a+1)/(2b)~𝜇𝑎12𝑏tildemu=-(a!+!1)/(2b)over~ start_ARG italic_μ end_ARG = - ( italic_a + 1 ) / ( 2 italic_b ), and a log-variance of σ~2=1/(2b)superscript~𝜎212𝑏tildesigma^2=1/(2b)over~ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 1 / ( 2 italic_b ). p(L)𝑝𝐿p(L)italic_p ( italic_L ) corresponds to a log-normal distribution whenever b>0𝑏0b>0italic_b >0, and to a power law,

p(L)∼1(L)a+1,b→0,formulae-sequencesimilar-to𝑝𝐿1superscript𝐿𝑎1→𝑏0p(L)simfrac1(L)^a+1,qquadquad bto 0,,italic_p ( italic_L ) ∼ divide start_ARG 1 end_ARG start_ARG ( italic_L ) start_POSTSUPERSCRIPT italic_a + 1 end_POSTSUPERSCRIPT end_ARG , italic_b → 0 , (6)
if b𝑏bitalic_b vanishes. Power laws, which are frequently observed (Marković & Gros, 2014), are hence a natural outcome of human activities, statistically, whenever the variance is either small or not taken into account.

There are two causalities for why the variance may be of secondary importance. Firstly, when uncertainties are small, viz when fluctuations around the mean are negligible. In this case the b𝑏bitalic_b-term in (5) is numerically small, becoming important only for exceedingly large or small lifetimes L𝐿Litalic_L. A second, more general argument is that one has to invest comparatively more resources to sample the variance than just the mean, in particular when correlations are present (Broersen, 1998). If time is a scarce resource, only the mean can be obtained reliably from a time dependent functionality, such as the lifetime distribution of cultural items. As a consequence it follows, that power laws are present when individual decision times are shorter than typical chart listing duration, with log-normal distributions emerging for extended decision times.

The probability density p(L)𝑝𝐿p(L)italic_p ( italic_L ) is normalizable only for b≥0𝑏0bgeq 0italic_b ≥ 0 if arbitrary large lifetimes L𝐿Litalic_L are allowed. Real-life time intervals are however finite, which makes a negative b𝑏bitalic_b also viable. The values for a+1𝑎1a+1italic_a + 1 and b𝑏bitalic_b obtained throughout this study for different music and social network activity charts are discussed in the next section.

The information-theoretical approach to the statistics of human decision-making implies a characteristic functional dependency for the distribution of the lifetime of cultural items. We tested this dependency for book and album sales charts, for music download charts, as well as for social media comments. Data retrieval and processing is described in Sect. 4. To visualize the possible p(L)𝑝𝐿p(L)italic_p ( italic_L ) distribution functions, figures with log-log representations of lifetime distributions include quadratic fits that correspond to Eq. (3). However, these fits are not claimed to correspond to statistically validated models, as discussed, e.g., by Clauset et al. (2009). In Table 1 and 2 the respective parameters are listed.

3.1 Chart diversity

An intuitive measure for the turnover rate of charts is the chart diversity d=Na/Ns𝑑subscript𝑁𝑎subscript𝑁𝑠d=N_a/N_sitalic_d = italic_N start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT / italic_N start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT, where Nasubscript𝑁𝑎N_aitalic_N start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT is the number of unique titles, here per year, normalized by the number of available slots Nssubscript𝑁𝑠N_sitalic_N start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT (for the whole year). Figure 1 shows the evolution of the chart diversity of the New York Times best-seller list, in comparison with the top 10 and top 100 Billboard album charts. Billboard introduced streaming data into the chart in 2014 but still published a sales chart. This is why the line splits in 2014, with the chart including streaming shown as a dotted line.

The chart diversities of the New York Times best-seller list and of the Billboard album charts have taken a similar evolution over the course of the last three to four decades. This fact is quite remarkable, given that individual music albums and fiction books differ substantially concerning their respective consumption times. Generally one needs substantially longer time to read a novel than to listen to an album. Their overall key core characteristics are on the other hand similar. The average number of weeks a novel spent on the New York Times best-seller list w¯≈1/d¯𝑤1𝑑barwapprox 1/dover¯ start_ARG italic_w end_ARG ≈ 1 / italic_d (where d𝑑ditalic_d is the chart diversity), shrunk from w¯≈1/0.1=10¯𝑤10.110barwapprox 1/0.1=10over¯ start_ARG italic_w end_ARG ≈ 1 / 0.1 = 10 weeks in the 1980s to about w¯≈1/0.3≈3¯𝑤10.33barwapprox 1/0.3approx 3over¯ start_ARG italic_w end_ARG ≈ 1 / 0.3 ≈ 3 in 2010. The equivalent numbers for the top-10 Billboard album charts are w¯≈1/0.1=10¯𝑤10.110barwapprox 1/0.1=10over¯ start_ARG italic_w end_ARG ≈ 1 / 0.1 = 10 in the 1980s and w¯≈1/0.4=2.5¯𝑤10.42.5barwapprox 1/0.4=2.5over¯ start_ARG italic_w end_ARG ≈ 1 / 0.4 = 2.5 in 2010.

The New York Times best-seller list and Billboard album charts are also similar with respect to the dynamics of titles that eventually make it to the top, the number-one titles. For this there are two possible courses. Either a title enters the chart directly at the top, then its number-one position is guaranteed. Alternatively a given title starts lower and works its way up to the top, over the course of several weeks or months. The probability for a number-one title to take the first course of action, Ponesubscript𝑃oneP_rm oneitalic_P start_POSTSUBSCRIPT roman_one end_POSTSUBSCRIPT, is shown in Figure 1. In the 1970s and 1980s close to no number-one best seller entered the New York Times best-seller list at the top. This changed dramatically in the following three to four decades. From the late 2000s onward close to all, about 95%percent9595%95 % of all number-one novels started charting right at the top. This is similar to the evolution the Billboard album charts took. A small difference is that the New York Times best-seller list started the transition about five years prior to the Billboard album chart, however with the evolution taking about ten years longer. In general the Billboard top 10 and top 100 charts differ only marginally.

The changes in Ponesubscript𝑃oneP_rm oneitalic_P start_POSTSUBSCRIPT roman_one end_POSTSUBSCRIPT are also reflected in the time it takes on the average to become number one. Before 1980, when close to all number-one titles had to work their way up to the top, this process took on the average five to eight weeks. Today number-one novels as well as albums become number one in under a week on the average, since almost all number-one titles start as such.

3.2 Musical chart lifetimes

In Figure 2 the lifetime distributions of daily and weekly music album charts are presented, namely for the Billboard album charts, which are published on a weekly basis, and for the respective Spotify download charts.

Comparing the functional dependency of past (1980-1982) and present day (2017-2019) Billboard album lifetimes, one observes a transition from a log-normal distribution to a power law, as parameterized by (3). An equivalent transition is seen between daily and weekly Spotify charts. A caveat is here the outlier that can be observed for the Spotify daily charts at a lifetime of exactly one week. The origin of this outlier is unknown, it may be caused possibly by weekly algorithmic influences, such as maintenance periods. It is in any case remarkable, that the lifetime distributions presented in Figure 2 change on a functional level, either with time or when the charting period is increased. We argue, that a separation of time scales causes both transitions.

Traditionally, music albums were bought in music shops, which involved a personal trip, and hence a comparatively long execution time. Long decision and execution times imply, as argued in Sect. 2.3, that the variance of the album lifetime distribution is relevant, which takes consequently the form of a log-normal distribution. The typical time needed to acquire a music album dropped however substantially below one week when online shopping started to become relevant in the 1990s. Given that one week corresponds to the chart frequency, this development implies that the variance of the album lifetime distribution lost its relevance. The desire to maximize information, the basis of the here discussed theory of statistical decision-making, leads in this case to power laws.

We believe that an analogous argument explains why daily and weekly Spotify album charts show respectively log-normal and power law distributions. On the average, it is presumably not a matter of only a few hours to listen and to appreciate music albums, which may contain a substantial number of titles, but of a few days. This time scale, several days, lies between the frequencies of daily and weekly streaming charts, which would explain why daily and weekly album charts are respectively log-normal and power law distributed.

Whereas the Billboard album charts are sales-based, the respective single charts are compiled predominantly on the basis of airtime statistics, for which the number of times songs are played by radio stations are counted. A direct comparison of album and single charts is in this case not possible (Schneider & Gros, 2019). The equivalent caveat does not hold for the Spotify charts, for which both album and single charts are based on streaming counts.

In Figure 3 the lifetime distributions of daily and weekly Spotify single charts are shown. Both are close to power law distributions with only small quadratic contributions. The overall process, to listen to a music sample, to decide to stream, and to actually do it, can be assumed to take only a few minutes for individual songs, but substantially longer in the case of albums. The observation that already daily single download charts show indications for power law behavior, with respect to the distribution of song lifetimes, is hence consistent with our basic framework.

3.3 New York Times book best sellers

In Figure 4 we present the probability that a given title stays for a certain number of weeks on the fiction best-seller list of the New York Times. In contrast to the top-100 music charts discussed further above, only the top ten titles of the week are included in the NYT best-seller list. In order to compensate for the comparatively limited database, we averaged the lifetime distribution over consecutive ten-year periods. Substantial scattering of the data can be observed nevertheless.

Included in Figure 4 are quadratic fits, -(a+1)s-bs2𝑎1𝑠𝑏superscript𝑠2-(a+1)s-bs^2- ( italic_a + 1 ) italic_s - italic_b italic_s start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, to the log-log representation of the lifetime distribution (compare Eq. (3)). It is notable, that the distribution of book lifetimes evolves from being convex in log-log scale in the 50s and 60s to be being concave, starting from the 80s. The turning point, in the 70s, matches roughly a shallow minimum in the chart diversity, as shown in Figure 1. It is presently unclear what drives this interesting phenomenon. The presence of a finite second-order component, b≠0𝑏0b
eq 0italic_b ≠ 0, indicates in any case that the individual time scales for buying and reading books has not dropped below one week, the charting frequency. PREMIUM EBOOKS appears in the 70s, when the lifetime distribution transits from convex to concave. The power laws seen in the distribution of musical charts emerge in contrast in the concave region, see Figure 2, when an initial b>0𝑏0b>0italic_b >0 progressively vanishes (Schneider & Gros, 2019). This indicates two distinct causalities for the two power laws, for the late-state Billboard lifetime distributions and for the book lifetimes in the 70s.

3.4 Dynamics of Reddit comments

New data items, like posts or images, are at the core of social media activity. The response of other users, in the form of comments or likes, will determine the popularity of social media posts and with this the likelihood of additional likes and comments. In this context we analyzed the statistics of Reddit post commenting, using publicly available data (Reddit Database, 2020). Note, that Reddit is a discussion website organized around collections of user-created discussion boards for individual subjects called “subreddits”. Users can submit posts such as text posts, links, images and videos to a subreddit. Posts are then voted up or down and commented on. While upvoting can be done directly on the post overview screen while scrolling through, in order to comment, users have to actively click on the post.

The most popular posts from each subreddit are also shown on the front page on login. There are multiple options for sorting to determine which posts are the most popular. The old default was “Hot”, which was based on the number of upvotes (log-weighted) and the submission time (recent posts prioritized). In 2018, “Best” was introduced as the new default, which tries in addition to show new content, including weighting based on comment activity. Comment data is publicly available (Reddit Database, 2020), including precise timestamps, vote counts are available however only as monthly averages. The technical aspects of our data analysis is described in Sect. 4.3.

For book and album charts the ranking criteria are the number of sales and downloads within the respective chart periods. In analogy, we used the number of comments for Reddit posts. In Figure 5 the chart diversity and the respective lifetimes for 10- and 60-min Reddit top-100 charts are shown. Within the time window analyzed, 2013-2015, a downward trend in chart diversity is observed, with an equivalent increase in chart lifetimes. This is in contrast to the long-term trend for book and album charts that can be observed in Figure 1.

The distributions for 10- and 60-min chart lifetimes presented in Figure 5 is non-monotonic. On a coarse level, the probability to observe a certain lifetime drops in a power law-like fashion, which is however interseeded by a pronounced local maximum. The maxima correspond to characteristic time scales of about 20-26 hours, for both the 10- and the 60-min charts, which suggests that the intrinsic 24 hours day-night activity cycle may be involved. Users may sleep on a post or comment of the day, in order to revisit it in the next morning. A lesson learned from the Reddit database is then, that the presence of characteristic time scales may interfere strongly with the otherwise operative feedback loop between posts and comments.

3.5 Twitter Hashtags

We use a publicly available corpus of Twitter hashtag statistics to study the residence time of hashtags in hourly, daily and weekly top-50 charts. See Sect. 4.4 for data source and handling. Hashtags and Reddit comments are both examples of active user contributions, albeit with a key difference. Adding a hashtag to a tweet can be presumed to be on the average less time-consuming than composing an entire Reddit comment. We may hence expect the trend towards self-organization to be more pronounced for Twitter hashtag charts, than for Reddit comments.

In Figure 3 the distribution of Twitter hashtags for hourly, daily and weekly top-50 charts are presented. With respect to the Reddit data shown in Figure 5 one observes that the 24 hour activity cycle is now substantially less pronounced. Instead of a peak, the dominant feature of the lifetime distribution of the hourly Twitter hashtag charts is a kink. To be precise, only a weak local maximum occurs at about 18h for the hourly charts, and a 30% drop at 24h. For daily and weekly charts no anomaly is observed. Fitting the Twitter hashtag data with the maximum entropy distribution (3), we decided to split the 1h lifetime data in two parts, below and above the above discussed kink. Within the log-log representation, one finds that the quadratic contribution tends to become smaller for longer charting periods, viz when going from 1h to daily and weekly charts. This observation complies with the argumentation laid out in Sect. 2.3, namely that lifetime distributions become more power law-like when the time scale of the individual activities is substantially smaller than the charting periods.

4 Data sources and processing

4.1 Book and album charts

The New York Times book best sellers (New York Times Best Sellers, 2020), the Spotify music streaming charts (Spotify Charts, 2020) and the Billboard sales-based music charts (Billboard Charts, 2020) have been obtained from public internet sources. We also examined the statistics of the set of Twitter hashtags compiled by Lorenz-Spreen et al. (2019).

The algorithm used for the compilation of the Billboard charts has been adjusted over time, with a major update in 2014/15. At that point the traditional sales-based ranking was substituted by a ranking based on a multi-metric consumption rate, which includes weighted song streaming. This update, which took effect at the end of 2014, affects the chart statistics profoundly.

The original Billboard album chart, the Billboard Top 200, was retained after the 2014/15 metric update under a new name, as ‘Top Album Sales’. Data continuity is consequently achieved when using the Top Album Sales charts from 2014/15 on. Whenever possible, we show results obtained from both the Billboard Top 200 and the Top Album Sales charts, where the latter are compiled according to the unaltered sales-based ranking rules, albeit only for 100 ranks.

For data analysis we used statistically relevant binning, which consists, as explained in Sect. 4.5, of adjusting bin sizes dynamically such that a minimal number of Nmin(data)superscriptsubscript𝑁mindataN_rm min^rm(data)italic_N start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( roman_data ) end_POSTSUPERSCRIPT data points per bin is obtained.

4.2 New York Times Best-seller List

In contrast to the Billboard magazine, the New York Times does not publish a full history of their best-seller list. Only recent lists are published on the official New York Times website (New York Times Best Sellers, 2020). Consequently, the analysis presented here relies on a republication of the data by Hawes Publications (2020), which collected and republished the weekly best-seller list from the 1950s until today. Since the length of the best-seller list changed repeatedly, all analysis above consider a top 10 ranking.

4.3 Reddit data analysis

Reddit data was downloaded from the Reddit Database (2020). Due to the large data size, we restricted our analysis to two six-month periods, the first half of 2013, which contains 19.050.122 post and 187.385.130 comments, and the second half of 2015, with a total of 37.038.895/347.236.797 entries. Comments of posts receiving less than 1000 comments over the observation period were removed from the data, since they would not show up in the charts analyzed. The cleaned comment data was separated into time slices based on the comment timestamps and aggregated by post IDs (identification labels). In this way charts for consecutive 10 minute and 60 minute slices where generated, as based on the number of comments received. Post lifetimes were evaluated from the respective top-100 charts.

4.4 Twitter Hashtags

The hourly ranking of the 50 most common hashtags were gathered by Mønsted (2019). This dataset, which has been used to study the acceleration of collective attention (Lorenz-Spreen et al., 2019), contains the 50 most common hashtags out of a 10% sample of all tweets, which were gathered every hour during the period from 2013 through 2016. For every hashtag the dataset also shows the number of times it appeared during the last hour in the 10% sample. This additional information allows, modulo the top-50 cut-off, to compile charts with longer observation intervals, e.g. daily or weekly charts.

4.5 Adaptive binning

For the generation of probability distribution functions from collected data one needs to group events together into bins. When sets of bins with predefined width are used, the number of events may vary strongly from one bin to another. It is hence advantageous, for statistical relevant binning, to adjust the width of the individual bins dynamically, until a minimum of Nmin(data)superscriptsubscript𝑁mindataN_rm min^rm(data)italic_N start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( roman_data ) end_POSTSUPERSCRIPT data points per bin has been reached. The binning procedure is finished once this is not any more possible. Comparing results obtained for different Nmin(data)superscriptsubscript𝑁mindataN_rm min^rm(data)italic_N start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( roman_data ) end_POSTSUPERSCRIPT, as done in Figure 5, allows to gauge the accuracy.

Consumption charts, like sales and streaming charts, show remarkable similarities between different cultural products, e.g. when comparing music albums and literature. The evolution of the respective chart diversities, as well as the probability that a number-one title debuts as such, follow parallel courses for these two cultural goods. Similarly, one can fit the distribution of on-chart residence times, the respective lifetimes, in all cases by a generalized log-normal distribution that can be shown to maximize the information content of the lifetime distribution, to the extent as it is stored in the brain. Taking the long-term perspective, one notices distinct features for the evolution of the lifetime distribution of the New York Times best-seller list, and for album sales charts.

For music albums (Schneider & Gros, 2019), the lifetime distribution evolves from a log-normal distribution, as before the 1990s, to a power law, as for nowadays. In this context it is important to notice that power laws belong to the class of generalized log-normal distributions, albeit with a vanishing quadratic term in the exponent. The information-theoretical arguments presented in this study for the occurrence of log-normal distributions suggests that a transition from fully log-normal to a power law takes place when uncertainties, viz the variance, fade into secondary importance. This phenomenon is related to the time needed on the average for decision-making, given that it takes longer to gauge variances than means.

In societies, some time scales remain constant, others are subject to a secular acceleration process (Rosa, 2013). An example of a non-changing time scale is the period of typically 4-5 years between general elections, which contrasts with an accelerating opinion dynamics (Gros, 2017). Of relevance for the present study is the charting period of one week, which did not change since the inception of classical music charts and book bestseller lists. In contrast to the charting period, the decision time to buy an album is likely to have fallen substantially with the rise of the internet. Here we have argued that this development is reflected in the respective chart statistics. It is less clear if the same holds for the time people need to read a book.

Modern streaming charts can be used as a test bed for the relative time scale hypothesis. The available high temporal resolution allows narrowing down the time individuals need to decide to listen to an album, and then to actually do so. The lifetime distributions of Spotify album charts show that this time scale is nowadays between one week and one day. For a single song the decision can be made comparatively faster, which leads for both weekly and daily single charts to power law lifetime distributions.

Activity charts, such as Twitter hashtags or Reddit postings, differ in certain aspects from consumption charts. Human activities are subject to a day-night cycle, which shows up for the case of commenting on Reddit as a resurgence after around 24 hours. Previous-day posts are likely to be revisited after a night of sleep, as evident in Figure 5. This phenomenon, the 24h-cycle, will mask underlying power laws, if existent. The Twitter charts provide evidence for an equivalent occlusion mechanism, see Figure 3.

The lifetime distribution of hourly hashtag charts is characterized by a peak at 18 hours, together with a subsequent drop at 24 hours. Charts averaging over one or more full days, daily and weekly charts, are represented instead by smooth distributions. One finds indications for power laws with minor quadratic corrections. These results suggest that power laws appear ubiquitously for long-enough charting periods. In conclusion we believe that the data examined supports the basic hypothesis presented here, namely that aggregated human decision processes exhibit the pronounced statistical features characteristics of compressed information maximization.

Beyond information maximization, we found marked changes at the top of cultural charts, which started with the raise of the internet in the 1990s. This phenomenon regards the route to become a number-one hit, which took substantially longer in the past. We believe that this observation deserves further investigations, which would however transcend our present framework.

The work of BS was supported by the research grant of the Romanian National Authority for Scientific Research and Innovation, CNCS/CCCDI-UEFISCDI, project nr. PN-III-P4-ID-PCCF-2016-0084.

Data availability

All data is available from publicly accessible sources, as described in Sect. 4.

My Website: https://premiumebooks.net/?post_type=post&p=23

Notes is a web-based application for online taking notes. You can take your notes and share with others people. If you like taking long notes, notes.io is designed for you. To date, over 8,000,000,000+ notes created and continuing...

With notes.io;

* You can take a note from anywhere and any device with internet connection.
* You can share the notes in social platforms (YouTube, Facebook, Twitter, instagram etc.).
* You can quickly share your contents without website, blog and e-mail.
* You don't need to create any Account to share a note. As you wish you can use quick, easy and best shortened notes with sms, websites, e-mail, or messaging services (WhatsApp, iMessage, Telegram, Signal).
* Notes.io has fabulous infrastructure design for a short link and allows you to share the note as an easy and understandable link.

Fast: Notes.io is built for speed and performance. You can take a notes quickly and browse your archive.

Easy: Notes.io doesn’t require installation. Just write and share note!

Short: Notes.io’s url just 8 character. You’ll get shorten link of your note when you want to share. (Ex: notes.io/q )

Free: Notes.io works for 14 years and has been free since the day it was started.

You immediately create your first note and start sharing with the ones you wish. If you want to contact us, you can use the following communication channels;

Email: [email protected]

Twitter: http://twitter.com/notesio

Instagram: http://instagram.com/notes.io

Facebook: http://facebook.com/notesio

Regards;
Notes.io Team

Notes

Notes - notes.io

Shortened Note Link

Long File

Notes