Latest Updates:
Page Index Toggle Pages: 1 [2] 3 4 ... 12
Topic Tools
Very Hot Topic (More than 25 Replies) Yelena Dembo on Chess.com (Read 81511 times)
Schaakhamster
God Member
*****
Offline


I Love ChessPublishing!

Posts: 650
Joined: 05/13/08
Re: Yelena Dembo on Chess.com
Reply #162 - 09/24/10 at 13:06:23
Post Tools
drogo wrote on 09/24/10 at 12:34:22:
Schaakhamster wrote on 09/24/10 at 12:17:00:
The top 3/4 method isn't that difficult to grasp. I do understand how it works. But the way you draw conclusions from the results is quite arbitrary. For instance the +5% rule: on what is that based? Just because it looks nice? Statistics allows to calculate probabilities and to calculate thresholds (for instance: if the result is higher then X only Y% of the results will be higher). Then it is basically just a case of choosing the threshold which is just a policy decision (how do we value the chance that we will exclude people that aren't cheating?). 



But how can we calculate the probability that a random variable is bigger than its maximum value?


Are you serious? The maximum value would be 100%   Angry. Simple as that.

No seriously, it is not because your data doesn't contain a higher value that it is impossible that somewhere out there it exists. What statistics does allow is , based on the data, to estimate how likely a certain result would be.

  
Back to top
 
IP Logged
 
Volcanor
Junior Member
**
Offline



Posts: 61
Location: Switzerland
Joined: 03/16/09
Gender: Male
Re: Yelena Dembo on Chess.com
Reply #161 - 09/24/10 at 12:54:16
Post Tools
Drogo: stop asking the same question. Either you're a few steps ahead most of us in statistics, or you have no ideas about statisctical tests.

If you know and understand what is meant by the H0 hypotesis, variance, Student t-tests, the risk alpha and Bonferroni corrections, fine: you're better than me in statistics and I apologize for not undestanding your smart question. And I'd be pleased if you're kind enough to formulate the question in words that I can understand.

If you don't understand the above terminology, you should open a book on statistics instead on posting the same remark again and again!
  
Back to top
 
IP Logged
 
mathman
YaBB Newbies
*
Offline


I Love ChessPublishing!

Posts: 2
Joined: 09/23/10
Re: Yelena Dembo on Chess.com
Reply #160 - 09/24/10 at 12:44:39
Post Tools
drogo wrote on 09/24/10 at 12:12:47:
@mathman:
"The main statistical principle which these pages show has been misunderstood by the chess world is that a move that is given a clear standout evaluation by a program is much more likely to be found by a strong human player. And a match to any engine on such a move is much less statistically significant than one on a move given slight but sure preference over many close alternatives. "

That's an excellent comment! I'm not sure I understand how Regan applies this principle in practice though. I think that he defines a partial match if the difference between the chosen move and the top engine pick is at most 0.20. He also defines ties if there are more moves with the same evaluations (and it's often the case in the ending to get 5-6 moves evaluated at 0.00). But, from the results he presents there, I'm not sure where he did what you said.


The partial matches and ties is not the thing he is using to make statements on confidence intervals. You have to look at the end of http://www.cse.buffalo.edu/~regan/chess/fidelity/M-Kaccuse/M-Kresults.txt  to see what he calls his prediction model.

It's not easy to draw conclusions on how he finds his predictions for a move (with evaluations e1,e2,e3... for best, second best, etc, moves), he just mentions that he used a regression model.

It's quite complicated to find good estimates, I think you will be stuck with high variances due to the following situations: say for delta=0.6 you have p=0.828 and and for delta=0.1 you have p=0.388 as in my example. Now consider the following evaluations in a position e1=0, e2=0, e3=0.6. If you only look at the first move than you would take as prediction something lower than 0.388, but if you'll look more closely it's logic to assume that there's a 0.828 chance that players in the data set would play the best or second best move, because delta2 is 0.6. The first and second best move score equally well so you can assume further that players who reject move3 (and lower scoring moves) will choose move1 with 50% probability, making a better estimation for p=0.828*0.5=0.414.
Maybe by using a complex regression model you can good estimations for every set of evaluations from a position, but I think that will require a huge data set.
  
Back to top
 
IP Logged
 
drogo
YaBB Newbies
*
Offline


I Love ChessPublishing!

Posts: 40
Joined: 09/21/10
Re: Yelena Dembo on Chess.com
Reply #159 - 09/24/10 at 12:34:22
Post Tools
Schaakhamster wrote on 09/24/10 at 12:17:00:
The top 3/4 method isn't that difficult to grasp. I do understand how it works. But the way you draw conclusions from the results is quite arbitrary. For instance the +5% rule: on what is that based? Just because it looks nice? Statistics allows to calculate probabilities and to calculate thresholds (for instance: if the result is higher then X only Y% of the results will be higher). Then it is basically just a case of choosing the threshold which is just a policy decision (how do we value the chance that we will exclude people that aren't cheating?). 



But how can we calculate the probability that a random variable is bigger than its maximum value?
  
Back to top
 
IP Logged
 
Schaakhamster
God Member
*****
Offline


I Love ChessPublishing!

Posts: 650
Joined: 05/13/08
Re: Yelena Dembo on Chess.com
Reply #158 - 09/24/10 at 12:17:00
Post Tools
Zygalski, we are not doubting the results of your work. But we are doubting the method behind it. As you have clearly stated that you don't understand the statistics behind the method you use I don't think there is anything you can add that will take our skepticism away.

The top 3/4 method isn't that difficult to grasp. I do understand how it works. But the way you draw conclusions from the results is quite arbitrary. For instance the +5% rule: on what is that based? Just because it looks nice? Statistics allows to calculate probabilities and to calculate thresholds (for instance: if the result is higher then X only Y% of the results will be higher). Then it is basically just a case of choosing the threshold which is just a policy decision (how do we value the chance that we will exclude people that aren't cheating?). 

Basically I'm having a though time believing in a system that get it's viability from statistics but ignores all statistical rules when drawing conclusions from the results. I do think that is why chess.com quickly gave into Dembo's demands. I think a statistician would have a field day with the method used unless they use more sophisticated methods after one of their monkeys (no insult intended) comes across something.


 

  
Back to top
 
IP Logged
 
drogo
YaBB Newbies
*
Offline


I Love ChessPublishing!

Posts: 40
Joined: 09/21/10
Re: Yelena Dembo on Chess.com
Reply #157 - 09/24/10 at 12:12:47
Post Tools
@mathman:
"The main statistical principle which these pages show has been misunderstood by the chess world is that a move that is given a clear standout evaluation by a program is much more likely to be found by a strong human player. And a match to any engine on such a move is much less statistically significant than one on a move given slight but sure preference over many close alternatives. "

That's an excellent comment! I'm not sure I understand how Regan applies this principle in practice though. I think that he defines a partial match if the difference between the chosen move and the top engine pick is at most 0.20. He also defines ties if there are more moves with the same evaluations (and it's often the case in the ending to get 5-6 moves evaluated at 0.00). But, from the results he presents there, I'm not sure where he did what you said.
  
Back to top
 
IP Logged
 
drogo
YaBB Newbies
*
Offline


I Love ChessPublishing!

Posts: 40
Joined: 09/21/10
Re: Yelena Dembo on Chess.com
Reply #156 - 09/24/10 at 12:03:27
Post Tools
Smyslov_Fan wrote on 09/24/10 at 06:59:13:
So, the level of analysis doesn't really matter?  It can be anywhere between 12 and 22 ply and it will have consistent results?  That's 6-11 full moves, and there's no distinction?



Finally someone makes a sound objection! Indeed, I always felt that depth 12 is just too low for present-day engines and very unreliable.

But just to be clear, there is no official statement that the method exposed here by Zygalski is the method used by chess.com. For instance, I think that it would make sense that, once you have all the game analyzed, not to compute the average tactical error.

Finally, I want to repeat: I'm not here to accuse Yelena Dembo, directly or indirectly. I simply want to understand if and how human play can be discerned from engines. I hate the cheaters who fill the internet servers, I hate that almost anyone plays blitz and bullet on ICC because of these losers who destroy the game. That's all.
  
Back to top
 
IP Logged
 
mathman
YaBB Newbies
*
Offline


I Love ChessPublishing!

Posts: 2
Joined: 09/23/10
Re: Yelena Dembo on Chess.com
Reply #155 - 09/24/10 at 11:08:18
Post Tools
I've read the discussions in this thread and numerous threads on the cheating forum on chess.com. I find it quite amazing that most of the defenders of the top3 method don't take good notice of what the expert on this matter (Kenneth Regan) has stated on his website (http://www.cse.buffalo.edu/~regan/chess/fidelity/) about the matchup methods:

The main statistical principle which these pages show has been misunderstood by the chess world is that a move that is given a clear standout evaluation by a program is much more likely to be found by a strong human player. And a match to any engine on such a move is much less statistically significant than one on a move given slight but sure preference over many close alternatives.

He himself gives an example how to cope with this difficulty in his analysis of the Mamedyarov-Kurnosov incident. Althought his method is not fully explained, I think it something like this:

- first you need a large set of positions with evaluations (for the 10 best moves in every postion or so) and the move which was actually played;
- you have to take care that the set is filled with positions of games which are under the same conditions as the postions you want to examine for cheating; therefore Regan's figures are not applicable to cc-chess, he used 10.000 postions from OTB games by strong grandmasters.
- from this data you can find the a priori probability p that a player would play the move with the highest evaluation in a position where there is a difference delta between best and second best evaluations;
- for testing a game you would have to sum (for all the moves you want to take in consideration) all the a priori chances. That would give a average score for this game. You could view that score as the average score that would be scored by the players in the data set given the same positions, although that's already a disputable statement.
- Confront this model score with the actual score in the game.

OK, nothing really different from the top-3 matchup method promoted by Zygalski and others. But the big difference is that you could calculate variances and confidence intervals.

I do not have a good set of data to use, but in one of  chess.com threads there were some interesting figures (http://www.chess.com/groups/forumview/titled-player-banned message #291) from which you can derive estimates for the a priori chances (source was the 8th ICCF world Championship).
I did it quite quickly but this is what I got out of it:

delta  positions No1-played    p            var            
0.1      1202            442            0.388      0.14
0.2      291            169            0.581            0.34
0.3      169            110            0.651            0.43
0.4      83            58            0.699            0.49
0.6      93            77            0.828            0.69
0.8      63            46            0.730            0.54
1.0      46            42            0.913            0.85
                       
So in a position where the engine gives a difference of .4 between best and second best move, you would expect that the best move is played in 69.9% of the cases. The last column is an indication how good the p-values are determined by this sample.

Now I did an analysis of the earlier mentioned game (message #68) kingboy-dembo, moves 8-34, with Houdini, multiline mode 5 for depth 17 using Arena (forward analysis, cleared hash at start and all those issues).
Results from the game: 23 times Dembo played the number 1 choice, which is 85%. The model with the a priori chances gives 16 hits on average, but the standard deviation is about 4! So the 23 out of 27 score is high but within margins you might expect.

And I have to say that with better model values I would expect the average to be higher. For instance if the delta was 0.23 I have taken as p the value for delta=0.2 from the table given. As there is no value for delta=0 I just assumed it to be 0.3 (with variance 0), that is just a guess.

A bigger sample of positions is of course needed to draw conclusions on cheating. What chance of false positives is acceptable to blame someone for cheating would you think? 1 in a million?

My main point is that the variances are far bigger than many might think and that you will blame a lot of false positives for cheating if you don't use a very solid model (and probably a lot of games are needed to check). The variances come from two sources: the p-values are derived by sampling and the use of the p-values (even if they would be known exactly) in the model also adds to the variance.

There are issues with this method, which can be discussed. But to me it looks far better than the simple matchup method most 'cheathunters' use.
  
Back to top
 
IP Logged
 
Ametanoitos
God Member
*****
Offline


The road to success is
under construction

Posts: 1427
Location: Patras
Joined: 01/04/05
Re: Yelena Dembo on Chess.com
Reply #154 - 09/24/10 at 09:30:58
Post Tools
Exactly! What i wanted to say! Chess.com can use these "suspect" methods of course but it is just stupid to rely on them to spoil personalities! And if this is for a fake profile or an anonymus player whatever, no harm is done. But for a known profesional? Don't you understand that a lot of people here are upset by this behaviour of chess.com?

Quote:
What credentials do I need?
I'm effectively just a copying/pasting monkey - same as anyone who uses an engine in online chess to suggest moves once out of database


i suspected that this is the problem and i suspect that the chess.com officials who make decisions as banning Dembo are exactly the same kind of monkeys! And this is irresponsible behaviour to say the least!

You can run your "monkey-tests" but you have to consult a real persons responsible opinion. You cannot claim that the computer decided that! Where do you leave! Maybe we should let computers run for the President of the USA! People who KNOW should be in a position to accuse others for cheating. And if an accusation is about to be done you should inform first the person you are going to accuse before you ban her.
  
Back to top
 
IP Logged
 
Volcanor
Junior Member
**
Offline



Posts: 61
Location: Switzerland
Joined: 03/16/09
Gender: Male
Re: Yelena Dembo on Chess.com
Reply #153 - 09/24/10 at 09:13:14
Post Tools
Zygalski wrote on 09/23/10 at 19:56:18:
What happened was this.
-I was taught the methodology
-then I started creating my own benchmarks independently (which you see on the previous page)
-after about a year I contacted the FM who taught me & said something like "it's 60/75/85%, isn't it?" and he replied "I can't tell you what our control thresholds are, but you're incredibly close".
-it was suggested to me by another game mod that I should add +5% to each of those thresholds (min 500 non-database moves, usual suspect game selection criteria) to eliminate as many false +ve's as possible, whilst not letting "too many blatant engine users escape detection".

That is really all I can say as far as the thresholds are concerned.

I hope you appreciate the effort that went into just my benchmarks in the last 2 or 3 years.
Perhaps looking at those stats again, be honest with yourselves, anyone getting results over
top 1 match: 70%
top 2 match: 80%
top 3 match: 90%
same analysis methods as benchmarks
...it isn't too surprising that they get banned from site, is it? 

Zygalski, you seem to have done some hard work on this issue and to be well-intended in sharing it with us. But it would have been smart from people who taught you this methodology to include some statistics with it. The main problem you have with your methodology (I put asside the relevance of the data) is that you simply calculate a mean for each player (for t1, t2, t3, t4 or whatever else) or lot of players, and compare it with a mean for a specific player such as YD. You don't perform a statistic test, but add a random number of 5% to say that it avoids falsly detecting non-cheater. That's not statistics, or it's statistics from the 15th century.

If you want to statistically compare two means, you should use a t-test (http://en.wikipedia.org/wiki/T-test) or probably a more sophisticated statistical test (i.e., one which I don't understand, but would be probably needed for this application, including Bonferroni corrections or others). As a result, you would test an hypotesis (Is YD's data statistically different from my original data?) and have a risk of error associated with the fact that you declare YD's data statistically different from the pre-computer area data. As long as you don't perform a statistical test, you should stop defend your methodology.

In my opinion, statistical tests should be conducted to screen for suspects, but not to establish if the suspect is guilty or not. At this point, I think that 2 or more experts (IM or GM correspondance chess players) should independantly look at the games of the suspect YD and decide if (s)he cheated or not in their opinion. Of course, not by performing some statistical test at this step. But by considering all the moves, why YD deviated sometimes from the computer moves, and so on.

Finally, the decison should be made by the people in charge of chess.com, taking into account the statistics, the report of the experts and other factors such as the overall ratio of victory/loss of the suspect and the speed at which (s)he plays his/her games.
  
Back to top
 
IP Logged
 
Ametanoitos
God Member
*****
Offline


The road to success is
under construction

Posts: 1427
Location: Patras
Joined: 01/04/05
Re: Yelena Dembo on Chess.com
Reply #152 - 09/24/10 at 08:55:57
Post Tools
Zygalski wrote on 09/24/10 at 07:16:37:
Smyslov_Fan wrote on 09/24/10 at 06:59:13:
Zygalski wrote on 09/24/10 at 06:08:29:
...
analysing now under the following conditions:
Houdini 1.03a x64 4_CPU Search time: 40s Min Depth: 12 ply Max Depth: 22 ply Hash Table: 512Mb
4x AMD Phenom 2.30Ghz 4GB RAM

Batch analyzer estimated time remaining: 21hrs 56mins Cry


So, the level of analysis doesn't really matter?  It can be anywhere between 12 and 22 ply and it will have consistent results?  That's 6-11 full moves, and there's no distinction?

With that level of accuracy, you may be better off with a ouija board.

Instead of fixing the number of seconds, fix the depth of analysis!

So, say I fix the depth at 24.
Are you going to pay my electricity bill or buy me a new pc if mine goes bang after 3 days of 100% CPU use?
Be reasonable.  Roll Eyes



Are you going to pay Dembo's lost courses? I bet she lost some money from accusations from  chess.com!

Was Dembo informed that the site is willing to close her acount or did she found out the "hard" way? This looks like a really bad policy from chess.com and i know some titled players that will never go and play there because of that.

Also, if i remember correctly, you said that Dembo's statistics were about 5% over the expected. This looks like suspect to me. Using Houdini and Rybka i found out that they do not agree always about the best and second moves. Using Fritz this percentage wa higher! I wrote a book where i analysed many many positions that were evaluated very differently by the top engines. So, if someone is using a Fritz 12 engine at reasonable moments in his game he will never be caught! And the whole engine-corr play issue is another dark story. Bern in his Stonewall book explaines the use of Fritz 9 engine that understands better the Stonewall positions that Fritz 10! So, a serious corr player always checks his analysis with an engine and this looks like no cheating in my eyes.

May i ask you guys (Zygalski and drogo) about your playing strengh? This will solve many of my questions.
  
Back to top
 
IP Logged
 
Smyslov_Fan
God Member
Correspondence fan
*****
Offline


Progress depends on the
unreasonable man. ~GBS

Posts: 6902
Joined: 06/15/05
Re: Yelena Dembo on Chess.com
Reply #151 - 09/24/10 at 08:09:17
Post Tools
Zygalski, I didn't say that you needed to increase the ply count. I said you needed to be consistent, and know why you chose the ply count you chose.
  
Back to top
 
IP Logged
 
Schaakhamster
God Member
*****
Offline


I Love ChessPublishing!

Posts: 650
Joined: 05/13/08
Re: Yelena Dembo on Chess.com
Reply #150 - 09/24/10 at 07:56:43
Post Tools
Sounds like a sect to me: I was taught by the master. I then repeated his teachings to the letter on my own. Great was my amazement when he foretold my results.

Anyway the critical question hasn't been answered and I doubt that our new forum friends can answer it because they just don't know. What is the main idea behind the system? Is it statistical sounds? Has it been properly researched?

If chess.com is serious about these methods why do they rely on volunteers for calculations and sorts? I would think they would at least choose the methodology and construct the test data themselves to give to their volunteers if they outsource it because they haven't got the computing power available.
  
Back to top
 
IP Logged
 
Smyslov_Fan
God Member
Correspondence fan
*****
Offline


Progress depends on the
unreasonable man. ~GBS

Posts: 6902
Joined: 06/15/05
Re: Yelena Dembo on Chess.com
Reply #149 - 09/24/10 at 06:59:13
Post Tools
Zygalski wrote on 09/24/10 at 06:08:29:
...
analysing now under the following conditions:
Houdini 1.03a x64 4_CPU Search time: 40s Min Depth: 12 ply Max Depth: 22 ply Hash Table: 512Mb
4x AMD Phenom 2.30Ghz 4GB RAM

Batch analyzer estimated time remaining: 21hrs 56mins Cry


So, the level of analysis doesn't really matter?  It can be anywhere between 12 and 22 ply and it will have consistent results?  That's 6-11 full moves, and there's no distinction?

With that level of accuracy, you may be better off with a ouija board.

Instead of fixing the number of seconds, fix the depth of analysis!
  
Back to top
 
IP Logged
 
Smyslov_Fan
God Member
Correspondence fan
*****
Offline


Progress depends on the
unreasonable man. ~GBS

Posts: 6902
Joined: 06/15/05
Re: Yelena Dembo on Chess.com
Reply #148 - 09/24/10 at 06:53:34
Post Tools
My first instinct was to believe that Yelena Dembo cheated, but the people at chess.com weren't very careful in the way they used their data.

Now, the more I look at what the people from chess.com have said here and at chess.com, and the more closely I look at the performance of humans in otb tournaments, the less I believe in their "system".

I understand that chess.com has some interest in protecting its detection methods. But the comments by the people in charge have shown that they did not even begin to consider the issue as a serious statistical problem.

One person here has repeatedly stated that his authority comes from a 2300 rated FM at another site. As if a title in chess is a qualification in statistics!

I still hope chess.com has a reliable method for determining whether computer cheating has occurred. But from what I've seen, they need to start afresh and ask all the key questions over again.
  
Back to top
 
IP Logged
 
Page Index Toggle Pages: 1 [2] 3 4 ... 12
Topic Tools
Bookmarks: del.icio.us Digg Facebook Google Google+ Linked in reddit StumbleUpon Twitter Yahoo