Why logarithm in entropy
Note that from 21 Thus, from 23 and 20 , Then, the ansatz in 26 will give which means that 26 solves equation Using the property that which is a natural property of a logarithmic function, it is determined that Consequently,. Explicitly, which is the same as that in The preceding equation can be written as.
It can be easily verified that. Graphs of for are shown in Figure 1 while graphs of with one fixed parameter are shown in Figure 2.
In this section, the inverse of the three-parameter logarithmic function will be derived. It is also verified that the derivative of this logarithm at is and that the value of the function at is zero.
Moreover, it is shown that the following equality holds. It follows from 16 that the three-parameter logarithmic function is an increasing function of. Thus, a unique inverse function exists.
Theorem 3. The inverse of the three-logarithmic function is given by. To find the inverse function, let and solve for. That is, from which Thus, the inverse function is given by where the - exponential is defined in 8. Theorem 4. The three-parameter logarithm satisfies the following properties: 1 2 3 The slope of is positive for all 4.
To find the derivative, use 17 to obtain From 40 , the slope of is positive for all. This is also observed in Figures 1 and 2. To prove part 4 of the theorem, let , , and. From [ 5 ], then. A three-parameter generalization of the Boltzmann-Gibbs-Shannon entropy is constructed here, and its properties are proved. Based on the three-parameter logarithm the entropic function is defined as follows:. If where is the number of states. The functional form of given in the previous section is analytic in as is analytic in.
Consequently, is Lesche-stable. An entropic function satisfies this condition if a zero probability state does not contribute to the entropy. That is, for any distribution. Observe that in the limit , is finite if one of is greater than 1. Consequently, provided that one of is greater than 1. Concavity of the entropic function is assured if.
Theorem 5. The three-parameter entropic function is concave provided. By manual calculation which is a bit tedious , In the limit , the second derivative given in 47 is less than zero if. Thus, concavity of is guaranteed if. In the limit , concavity is guaranteed if.
If , concavity holds if. A twice-differentiable function of a single variable is convex if and only if its second derivative is nonnegative on its entire domain. The analysis on the convexity of is analogous to that of its concavity.
In the limit , convexity is guaranteed if. In the limit , convexity is assured if. Thus, we have the following theorem. Theorem 6. The three-parameter entropic function is convex provided.
Concavity of is illustrated in Figure 3 a while convexity is illustrated in Figure 3 b. An entropic function is said to be composable if for events and , where is some single-valued function [ 5 ].
The Boltzmann-Gibbs-Shannon entropy satisfies. Hence, it is composable and additive. The one-parameter entropy for is also composable as it satisfies. The two-parameter entropy [ 5 ] satisfies, in the microcanonical ensemble i. However, this does not hold true for arbitrary distributions , which means is not composable in general.
For the 3-parameter entropy , a similar property as that of 51 is obtained as shown in the following theorem. Theorem 7. The three-parameter entropy satisfies where. Note that from which Similarly, From 57 , Using the following result in [ 5 ], Equation 58 becomes Now, with we have Consequently, which can be written as. In view of the noncomposability of the 2-parameter entropy, is also noncomposable.
Consequently, a three-parameter entropic function is defined, and its properties are proved. It will be interesting to study the applicability of the three-parameter entropy to adiabatic ensembles [ 13 ] and other ensembles [ 14 ] and how these applications relate to generalized Lambert W function. The computer programs and articles used to generate the graphs and support the findings of this study are available from the corresponding author upon request.
Corcino and Roberto B. This is an open access article distributed under the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Article of the Year Award: Outstanding research contributions of , as selected by our Chief Editors. Read the winning articles. Journal overview. Special Issues. Corcino 1,2 and Roberto B. Academic Editor: Serkan Araci. Received 20 Jul The highest entropy is when we have a system with many rare and specific messages.
The lowest entropy with frequent and general messages. In between, we have a spectrum of entropy-equivalent systems which might have both rare and general messages or frequent but specific messages. There is a profound reason why the logarithm comes into picture, and it is not randomly chosen. A 2-valued letter is nothing else but a bit. For detail on this, check here. Check here for a detailed proof. Now, the information about an outcome that goes from the sender to receiver is actually the codeword that represents the outcome.
Finally, what would be the information content of any event in total, that is, the information of all the outcomes combined? In other words, what is the information content of a system that can have different states with different probabilities? The answer is that each outcome or state adds its information to the system but only in the ratio of how much of it is there - i.
The above has been explained in more detail here. Ok, no maths. I too was curious to know the same thing, and have understood the purpose served by using logarithm in entropy equation?. Let's use a simple example. Suppose we are looking at 3 different containers. Each container has some triangles or circles. Let's focus on first container - Container 1 which has 26 triangles, 4 circles. If you put your hand inside the container and picked one, then what is the chance that you pick a triangle or a circle?
If you were allowed to pick until there is nothing left in Container 1, that would mean after 30 picks you would have 26 triangles, 4 circles in hand. The exact opposite can be said about picking a circle.
This likelihood is what we usually term a probability. In the same way, you have more doubt less certain about picking a circle. This uncertainty is what is usually termed as entropy , which is the opposite of probability.
You have more confidence you'll pick a triangle but if you do get a circle at times, you won't be as surprised because chances of getting a circle are less. In other words, entropy element of surprise, chance that you get something else than what you expected will be less for Container 1.
To use the same line of thought, from Container 2 it is equally likely you pick a triangle or circle on any pick, you have same confidence about picking or same uncertainty about not picking a triangle or a circle. So entropy element of surprise, chance that you will get something else than what you expected is higher for Container 2. Thus, whenever you pick something out, you have absolutely no doubt entropy is zero for Container 3 that it will be a circle.
How can one put a number on probability, entropy for Container 1 or Container 2, Container 3 as a whole? It would be ideal to have the number between 0 and 1 so that one could easily represent it as a percentage.
I need some way to scale them make them bigger or smaller between 0 and 1. This scale should be the same irrespective of the number of items e. Logarithm provides this scale. Logarithms offer a way to represent numbers especially if they are large into reduced or scaled down versions.
To ensure my scale stays independent because total numbers of items can vary in different containers , one option is to use 2 as the base for logarithms. Any base 2, 10 , e will suffice as long as one measures each container using the same base. Specific to our examples, we either pick a triangle or we don't we pick a circle otherwise - so we have only 2 outcomes for any pick. Base 2 represents 0, 1 as 2 choices in computers hence that has been the choice of base to use for logarithm.
Using this fact about logarithms, if we multiply each ratio with it's log, we effectively scale it between 0 and 1. Note: Since we divide by log total number of items to keep the new scaled ratio between 0 and 1, multiplying by that ends up being less than 0 hence the negative sign is added to make it back to positive.
Hope this explanation helps to both visualize and make an intuition about Entropy , the choice of logarithms - statistical functions almost always use logarithms to scale down ratios and large numbers towards a common range representable between 0 and 1. Sign up to join this community. The best answers are voted up and rise to the top.
Stack Overflow for Teams — Collaborate and share knowledge with a private group. Create a free Team What is Teams? Learn more. What is the role of the logarithm in Shannon's entropy? Ask Question.
Asked 7 years, 8 months ago. Active 2 months ago. Viewed 38k times. Improve this question. To me, these are all in the same neighborhood Add a comment. Active Oldest Votes. Improve this answer. Piotr Migdal Piotr Migdal 5, 2 2 gold badges 26 26 silver badges 70 70 bronze badges.
Why can't we calculate entropy without the logarithm? And if you don't care for subtle information-related properties of Shannon entropy, you can use any of them though, they weight low and high probabilities differently. That suggests you haven't yet clearly articulated your question, because it sounds like you have some unstated concept of "entropy" in mind.
Please don't keep us guessing--edit your question so that your readers can provide the kinds of answers you are looking for. Is it that the logarithm essentially moves us from a diversity index to an information index - measuring the number of bits we need to tell the events apart.
Show 2 more comments. The logarithmic measure is more convenient for various reasons: It is practically more useful. Parameters of engineering importance such as time, bandwidth, number of relays, etc. For example, adding one relay to a group doubles the number of possible states of the relays. It adds 1 to the base 2 logarithm of this number.
Doubling the time roughly squares the number of possible messages, or doubles the logarithm, etc. It is nearer to our intuitive feeling as to the proper measure. This is closely related to 1 since we intuitively measures entities by linear comparison with common standards.
One feels, for example, that two punched cards should have twice the capacity of one for information storage, and two identical channels twice the capacity of one for transmitting information. It is mathematically more suitable. Many of the limiting operations are simple in terms of the logarithm but would require clumsy restatement in terms of the number of possibilities Source: Shannon, A Mathematical Theory of Communication [ pdf ].
Richard Hardy Flounderer Flounderer 9, 1 1 gold badge 31 31 silver badges 42 42 bronze badges. This is why the information reported is reported as such. There is an alternative quantity: the "perplexity" that reports information without the log. It makes intuitive sense. Why log transform? This transform makes sense if you use certain "medium" to tell the events apart say binary digit medium or bits but essentially why should be define entropy by creating some arbitrary medium and use that medium to tell the events apart.
It looks like we are taking long distance trip to just count the states by introducing the log. Show 1 more comment. Continuous entropy is usually finite. That's the problem with this answer: It gave you some intuition, but it fails you in reasoning about the general case. Mike Dunlavey Mike Dunlavey 1, 1 1 gold badge 11 11 silver badges 17 17 bronze badges. I have found the following references useful in addition to those listed elsewhere: Probability Theory: The Logic of Science by E.
Jaynes is one of the few authors who derives many results from scratch; see Chapter Contains an in-depth analysis of Shannon's source coding theorem; see Chapter 4. A: Yes. The resolution of your difficulty is to combine separate problems. I'll illustrate. Don't predict one die at a time: predict, say, five at a time. I toss dies, and sum total number of questions I ask for all dies. You need to ask a set of questions designed to simultaneously obtain the state of all the dice at once.
That is not the same thing as the average number of questions needed to find the state of one die at a time. The question is What purpose does the logarithm serve in this equation?
The logarithm usually based on 2 is because of the Kraft's Inequality. Lerner Zhang Lerner Zhang 4, 1 1 gold badge 29 29 silver badges 49 49 bronze badges. For example, consider three events when rolling a black die and a white die: 1 the white die is odd, 2 the white die is even and the black die is less than three, and 3 otherwise.
Either the dice are rolled together, or else the white die is rolled first, and maybe the black die if necessary. For a proof, see appendix 2 in: C. Neil G Neil G So if we merely count the no of states a system could have then it will suffice the three properties you mentioned.
So say dice has 6 sides so we can count 6 units of information while coin has 2 sides so 2 units of information. If we compare 6 with 2 we can easily deduce that dice carries more information fulfilling monotonic property you stated. This counting can be done in continuous domain by integrating measure theory. So counting is conditionally independent of how choice is split.
See Appendix 2 in the my citation. But it goes even longer back, for instance to this classic paper of CS Peirce , which discussed at length why using the logarithm: It is that our belief ought to be proportional to the weight of evidence, in this sense, that two arguments which are entirely independent, neither weakening nor strengthening each other, ought, when they concur, to produce a belief equal to the sum of the intensities of belief which either would produce separately.
IJ Good comments about entropy While the manuscript was with the publishers an article appeared involving ideas that are related in some ways to those of the present chapter. Then he goes on explaining how concepts from the Shannon paper is related to his book.
Aksakal Aksakal 53k 5 5 gold badges 84 84 silver badges bronze badges. Atamiri Atamiri 2 2 bronze badges. They ask why is it there? This is the paragraph explains why the formula has -log p instead of 1-p : Before further unpacking the formal definition of entropy, one would be justified in asking why not simply choose 1 — p instead of [—log p ] as the most appropriate measure of nonexistence?
Note that at that time, he had no idea about entropy : My greatest concern was what to call it. So my answer is: there is no reason for this.