# Thread: tips on manipulating list data

1. tips on manipulating list data
Originally Posted by trackCanuck
That's one big garden of weeding to get at the 1000th performER, if in fact those lists would do that. Those are lists of performances that contain multiple appearances from many of the same performers, appearing over and over.
Taking m1500 outdoors as an example: The webpage is flat text. Select 8634 lines and save them as a text file. Open a new Excel file. Go to Data > Get Data (dropdown) > From Text/CSV > Navigate to the data file and Import. Select File Origin as 65001 (this will import special characters) and Delimiter Fixed Width with columns 0,8,15,22,23,26,57,60,75,83,112,122,132 then Load.

If you sort by name and time, then flag dups, there are 977 competitors below the 3:38 cutoff. Bernard Lagat has 100 entries, Hicham El Guerrouj has 97.

2. tips on manipulating list data
Originally Posted by trackCanuck
Interesting. But we're still looking at 977 performances , not competitors. At least, not 977 distinct competitors - if I understand your example correctly.
No, those are performers. Sort by names (column F in this case). Add a column (in this case L). Put 1 in the first cell of the column (L1). Add a formula in cell 2: =IF(F2=F1,L1+1,1). That will count the number of performances for each performer. Copy L2 into all the cells below it in the column. Select all the cells in column L. Copy, then Paste Special as Values. Sort by 1:Column L, 2:Column B (Rank). 977 performances have a 1 in the last column. Those are the best performances by 977 performers.

3.
The IAAF all-time top lists have the top 1000+ performers for several events (but some events stop before 1000).

https://www.worldathletics.org/recor...oor/men/senior

Choose an event, pick "Best By Athlete", jump to page 10, and scroll to the bottom. There may be numerous ties when you're looking that far down the all-time list, so performer #1000 may be buried in the many tied for position 900-something, with nobody actually listed at #1000.

In the men's 100m, there are 92 tied for #964 at 10.22, with 10.23 having another string of ties for 1056th. In the marathon, five are tied at #998 with 2:10:00, followed by four ties at #1003 with 2:10:01.

4.
Originally Posted by Master403
No, those are performers. Sort by names (column F in this case). Add a column (in this case L). Put 1 in the first cell of the column (L1). Add a formula in cell 2: =IF(F2=F1,L1+1,1). That will count the number of performances for each performer. Copy L2 into all the cells below it in the column. Select all the cells in column L. Copy, then Paste Special as Values. Sort by 1:Column L, 2:Column B (Rank). 977 performances have a 1 in the last column. Those are the best performances by 977 performers.
"If you sort by name and time, then flag dups, there are 977 competitors below the 3:38 cutoff. Bernard Lagat has 100 entries, Hicham El Guerrouj has 97."

So 195 of Lagat's and El Guerrouj's 197 entries are distinct from the 977. You can't have 977 different performers if 2 of them account for more than 2 of the total. It wasn't clear how those 2 statements are related. You described how to compile a list of performers, then talked about a few who had multiple entries - on a list of performances.

5.
Originally Posted by trackCanuck
"If you sort by name and time, then flag dups, there are 977 competitors below the 3:38 cutoff. Bernard Lagat has 100 entries, Hicham El Guerrouj has 97."

So 195 of Lagat's and El Guerrouj's 197 entries are distinct from the 977. You can't have 977 different performers if 2 of them account for more than 2 of the total. It wasn't clear how those 2 statements are related. You described how to compile a list of performers, then talked about a few who had multiple entries - on a list of performances.
I don't understand your post at all. There are 8634 performances on the referenced list. Lagat has 100 of those (as quoted), not 195. El Guerrouj has 97, not 197. That leaves 8437 performances among 975 other competitors.

6.
Originally Posted by 18.99s
In the 1500m, the IAAF list stops at 3:38:00 with 3 tied for the #962 performer.
977 and 964 are close. One list stops at 3:37.99. The other stops at 3:38.00. That leaves a difference of 977 vs 961. The 977 includes some (undifferentiated) intermediate times in 1 Mile races (official, unofficial or estimates). I doubt IAAF/WA includes unofficial or estimates.

7.
Originally Posted by Master403
I don't understand your post at all. There are 8634 performances on the referenced list. Lagat has 100 of those (as quoted), not 195. El Guerrouj has 97, not 197. That leaves 8437 performances among 975 other competitors.
I am distinguishing between 2 types of lists, top performers and top performances. For example, the all time or yearly top performers lists never have more than one entry per athlete, unlike an all time top performances list which serves a different purpose, and lists many performances from the same performers.

Performers :

https://www.worldathletics.org/recor...sultsonly=true

Performances :

http://www.alltime-athletics.com/m_100ok.htm

The originator of the thread was asking for a list of the #1000 top PERFORMER in events, so the best list isn't one with 154 entries from Jim, 128 from Bob, 37 from Andrew, and so on. You explained how to get to a cutoff point in a list of PERFORMANCES using excel, so have you filtered out the multiple times by the same people or not?

8.
Originally Posted by trackCanuck
Is that a list of top performers, or top performances?
Performers. Performers. Performers.

This is getting frustrating. It took about 10-15 minutes to process one event. It has taken much longer to explain it.

I'll try again. I gave a method of extracting the data from the plain text on the web page and importing them into Excel. (There is a different, more interesting method for tables.)

I then described how to process the data in Excel by sorting the list, primarily by athlete, secondarily by the time rank order. (Row 1 is Abdelali Rayzn, rows 2 through 93 are Abdelati Iguider, ranked by time, and rows 94 through 97 are Abdelaziz Sahere). Once sorted in that manner, one can add a column that has a simple formula in each cell that counts which entry it is for the athlete. (Row 1 has a 1, row 2 has a 1, row 93 has a 92, row 94 has a 1, row 97 has a 4.) Paste the values over the formulas to retain them for further sorting.

Do another sort, this time primarily by the added performance count, secondarily by name. The first row is Abdelali Rayzn's best performance, the second is Abdelati Iguider's best, the third is Abdelaziz Sahere's. The first 977 rows are the best performance by each of the 977 Performers in alphabetical order. That is what was asked.

Once in a spreadsheet, there are other interesting items. The next 683 lines are the second-best performances, meaning 294 performers only broke 3:38 once. A different sort produces a ranking of El Guerrouj's 10th best mark of 3:28.21 ahead of Morceli's 10th best of 3:30.11. El Guerrouj ran 3:30 or better 34 times. Lagat has run under 3:38 100 times, El Guerrouj 97 times.

9.
you're making the assumption that the average shlub has ever even opened Excel, let alone has any proficiency with it.

10.
One problem I have with these sorts, in general, is any discrepancy in the name, including spacing will typically result in them being taken as separate 'individuals'. A related common problem is for females where name changes are common. But the instructions are pretty good (I did not examine really closely because I know how to do such sorting).

Page 1 of 3 123 Last

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•