Can the Data on Russian Losses Published by the Hochu Zhit Project Be Trusted? A CIT Analysis
On Oct. 6, the Ukrainian project Hochu Zhit [I Want to Live] published what it said were figures on Russian military losses in the war against Ukraine since the beginning of 2025. If the data is authentic, the document could represent a unique source, offering, for the first time, a full picture of the Russian Armed Forces’ casualties, including the ratio of killed, wounded, missing and captured soldiers across the entire army, rather than within units or formations. Until now, researchers have had no comparable source, so it is unsurprising that this article drew significant attention. Our team therefore decided to examine the data and assess whether it can be considered reliable.
The source material consist of photographs of a two-page table listing personnel and equipment losses, broken down by six groups of troops. Within each group, the table details losses by subordinate formations and units, as well as indirectly subordinate elements (listed under "ЧНП"). The project did not explain where the data came from, nor is it clear whether the table is an original document or a compilation of information gathered independently—or perhaps provided by Ukrainian intelligence.
If we assume that the table is an original document, its format appears highly uncharacteristic of the Russian Ministry of Defense and raises numerous questions. Documents of this level (covering a joint military grouping) typically carry a security classification—"secret" or even "top secret"—along with a date, an outgoing number, the total number of pages and the number of copies. They also include information about the document’s recipient, author and responsible officer. While some of these details might theoretically lie outside the frame of the photograph, the complete absence of any identifying or procedural markings already casts serious doubt on its authenticity.
If, however, the document is not an original but rather a summary of information known to the Hochu Zhit project, the only remaining way to assess its credibility is through the data itself. For that purpose, we conducted a statistical analysis of the figures provided.
The distribution of the last digit has been used to detect data falsifications in scientific research since the mid-1990s, and in electoral statistics since 2008 (in Russia, since 2009). The method is based on the idea of comparing how often each digit appears as the last one in a number within the analyzed vector of numbers. It relies on a well-established finding from psychological studies: when people are asked to generate random numbers, they tend to produce certain digits more frequently than others, revealing their unconscious preferences.
When analyzing electoral data by the distribution of the last digit, small precincts (with fewer than 100 voters) are often excluded, because one-digit numbers and the lower-order digits of two-digit numbers follow the distribution of the first and second digits according to Benford’s Law rather than a uniform distribution. By contrast, the distribution of the third and subsequent significant digits, in practical terms, does not practically differ from a uniform distribution. For this reason, we also excluded all one-digit numbers (28 in total, not counting zeros) and two-digit numbers (69 in total) from our analysis. However, our results are practically insensitive to the specific cutoff threshold.
In Russian electoral statistics, an abnormally high proportion of numbers ending in zero or five often serves as a reliable indicator of falsifications. A similar anomaly can be observed in the Hochu Zhit project database; however, in this case, there is an abnormally high number of fives but not zeros while ones and twos also appear far more frequently than they should.
As Mediazona [independent Russian media outlet] believes this may result from data falsifications at the source—Russian officers compiling reports for submission "upwards" may themselves manipulate the figures. However, such a hypothesis can explain only the anomalies in the frequencies of 0 and 5. All other digits should be represented in approximately equal amounts (i.e., uniformly distributed), since the eight-month totals are expected to have been derived from shorter reporting intervals, such as monthly reports. Such reports are almost certainly prepared in different headquarters by individuals with varying unconscious digit preferences. Therefore, these anomalies should disappear through aggregation. They indeed disappear, as we show below, even when summing numbers produced by a single author. The shares of all digits other than 0 and 5 should thus even out.
After excluding one-digit and two-digit numbers, 218 values remained, of which 44 ending in 0 or 5 were also omitted to eliminate the possible influence of manipulations by Russian officers. Thus, in analyzing the remaining 164 numbers ending in 1, 2, 3, 4, 6, 7, 8, or 9, each digit should account for approximately one-eighth of the dataset (12.5% of 164). In reality, however, we obtained a different result:
The resulting distribution deviates significantly from a uniform one. The shares of ones and twos are inflated, whereas sixes and sevens are markedly underrepresented: their combined total is smaller than that of twos alone. The probability that such a pattern could have arisen by chance was estimated using the Pearson chi-square goodness-of-fit test, yielding a p-value of approximately 0.013 (see the appendix). This means that the likelihood of obtaining such a distribution by random chance is only about 1.3%. In principle, this result alone could suffice, but we continued the analysis to better understand how the document had been compiled.
To do this, we divided the available data into non-aggregated (136 numbers greater than 99, of which 107 do not end in 0 or 5) and aggregated (82 numbers greater than 99, of which 57 do not end in 0 or 5)—that is, those resulting from summing values within the table itself. The latter category included the "Total" column, the "Total by Joint Military Grouping" row, and the subtotal rows for each group of troops (except the final "Recoverable" column). All other data were classified as non-aggregated.
Our analysis showed that the aggregated data appear random (p = 0.64), whereas the probability that the non-aggregated data could result from summing monthly reports drops from 1.3% to 0.4%.
Thus, it appears that the author of the published document first filled in the data for individual units and then summed them to obtain the aggregated figures. As a result, the distribution of all digits other than 0 and 5 in the aggregated data became "averaged out" and uniform—precisely as we suggested earlier when discussing summing monthly reports.
The casualty figures published by the Hochu Zhit project exceed those calculated by Meduza and Mediazona by nearly 40 percent compared with their estimates for 2024, and by about 55 percent relative to their assessment at the end of the summer of 2025. In theory, such an increase could be explained by an intensification of combat operations—Meduza and Mediazona, for instance, estimated that Russian military deaths in 2024 were 75 percent higher than in 2023. However, our analysis found clear signs of data manipulation in the Hochu Zhit report, indicating that the figures cannot be considered a reliable source for assessing Russian losses.