File talk:Knowledge of English EU map.svg
Instead of distoying information wouldn't it be better to change the tittle ot Knowlege of English in EU and europe.--J intela (talk) 10:38, 16 November 2011 (UTC)
Suggested corrections
[edit]The Isle of Man and the Channel Islands (Guernesy, Jersey) are not part of the EU - these areas should be coloured grey. Gibraltar is a member of the EU (through the UK) - it needs to be coloured for English as a native language. P M C 12:36, 22 January 2012 (UTC)
Date
[edit]What is date of datas used in map? Eurohunter (talk) 21:29, 18 December 2015 (UTC)
Brexit and future versions
[edit]If this map and related ones are updated to reflect Brexit please leave a note on the talk page de:Diskussion:Amtssprachen der Europäischen Union where captions have been changed to "Knowledge of XY in the EU [...] and the United Kingdom". Alternatively you can upload the new files under a new name, preferably one that mentions the year and/or the number of member countries, e.g. "Knowledge of English in the EU 27 in 2020 map.svg" or so. (Abbreviations such as EU 27 are quite common in EU parlance.) Love —LiliCharlie (talk) 08:57, 1 February 2020 (UTC)
Update?
[edit]@Darranc: Did you only remove UK from the map or also update the stats? Because the source has data on 2019 as @Nederlandse Leeuw: pointed out. Prototyperspective (talk) 17:34, 15 August 2024 (UTC)
- You know what? I'm just gonna make a new map based on this data with all countries in Europe. It will have a clear file name and clear description as a moment-in-time static snapshot instead of this vague, poorly-sourced, possibly dynamic updatable file about which the geographical scope is unclear. Nederlandse Leeuw (talk) 17:57, 15 August 2024 (UTC)
- File:EF English Proficiency Index 2019 Europe.svg. Done. Let's just use this map instead. Nederlandse Leeuw (talk) 18:49, 15 August 2024 (UTC)
- Thanks for that!
I don't know what you mean with the geographical scope being unclear as the image is clearly labelled EU which means if it is updated it should show data on current EU countries. First I thought it may be better to have an additional file with the Statista data but apparently they have it from "Klazz" According to data provided by Klazz,[…] and the site it links to is down. No idea what "Klazz" is. So probably it would indeed be good if you manually replaced all uses of this file with your new one. Prototyperspective (talk) 21:14, 15 August 2024 (UTC)- @Prototyperspective You're welcome!
- Well, let me put it this way. This map is a .svg version of a series of .png/.svg maps that were created in 2006/2008/2009, based on the 2005/2006 Special Eurobarometer 243:
- File:Knowledge English EU map.png 12 November 2006 Aaker > File:Knowledge of English EU map.svg 27 August 2010 by Alphathon.
- File:Knowledge of Spanish EU map.svg 16 March 2008 by Tyk
- File:Knowledge of Italian EU map.svg 2 July 2008 by Tyk
- File:Knowledge of Russian EU map.svg 2 July 2008 by Tyk
- File:Knowledge of German EU map.png 15 May 2009 by HernauMan. > File:Knowledge of German EU map (2010).svg 27 August 2010 by Alphathon.
- File:Knowledge of French EU map.svg 17 October 2009 by Addicted04.
- They all had the file name Knowledge of [language] EU map.png, later .svg, even though "EU" should often not be taken literally. E.g. the 16 March 2008 version of File:Knowledge of Spanish EU map.svg showed the knowledge of Spanish in Croatia and Turkey, even though Croatia did not become an EU member state until 2013, and Turkey still is not an EU member state as of 2024 (and perhaps might never become one). This usually means that the file name is either just a generalisation or a shortening of "Europe" to "EU", or (most likely in this case), based on a dataset containing EU member states + EU candidate member states. Maps like that, based on a specific dataset, should not be updated, but instead dated to a specific year, and identified as a static map for that year. Otherwise you'll risk creating a huge mess when trying to update it with conflicting datasets. I think the datasets used for these maps derive ultimately from the Eurobarometer 2005/2006, but they have unfortunately not been properly identified in the description, let alone the file name. The reason for this is because User:Tyk, who created the oldest maps in March/July 2008, indicated the source as Own work by uploader. Based on the following article from Wikipedia: en:Languages of the European Union. As of mid-March 2008, that article had a section "Language skills of European Union citizens", which featured the following text:
- The following tables are based on "Special Eurobarometer 243" of the European Commission with the title "Europeans and their Languages" (summary full text), published on February 2006 with research carried out in November and December 2005. The survey was published before the 2007 Enlargement of the European Union, when Bulgaria and Romania acceded. This is a poll, not a census. 28,694 citizens with a minimum age of 15 were asked in the then 25 member-states as well as in the then future member-states (Bulgaria, Romania) and the candidate countries (Croatia, Turkey) at the time of the survey. Only citizens, not immigrants, were asked.
- But because that text was subsequently removed or rewritten, the sources got lost, and so it was no longer clear what the maps were based on. Moreover, time passed, and some editors felt the need to "update" or "expand" the maps without knowing what the original maps were based on. This has led all sorts of subsequent editors to remove non-EU countries from some but not all of the maps in this group, while others have tried to "update" some but not all of these maps, but with different datasets than those of the Eurobarometer. The Statista.com dataset appears to be based on Klazz, but that is irretrievable now, as you already correctly noted above. So what we have ended up with now is a whole mess of unclear file names containing unsourced, poorly sourced, poorly described, incompatible maps with unclear geographical scopes. Is it just the EU, or can other countries be included in the dataset? If it is EU-only, which EU? EU25 (2004-2006)? EU27 (2007-2013a)? EU28 (2013b-2020a)? EU27 post-Brexit (2020a-present)? Nobody knows, because of unclear and inconsistent file names, sources, descriptions, and contents (File:Knowledge of French EU map.svg is the only file still explicitly indicating the 2005/2006 Eurobarometer as its source). The German map by HernauMan might not be based on the 2005/2006 Eurobarometer at all, or at least had a different scope at the beginning; the history shows editors removing Bosnia and Herzegovina, Turkey, and the UK for being "non-EU countries", while others readded them again saying that's not relevant, because the dataset included those countries at the time.
- On the other hand, some maps have completely gone the other way, and included countries that never were included in the 2005-2006 Eurobarometer dataset. Compare the original 12 November 2006 Knowledge English EU map.png map with the current Knowledge English EU map.png map (last modified 29 August 2022)). Where do the data for Russia, Belarus, Ukraine, Georgia, Armenia and Serbia come from? Why was Cyprus removed? Nobody knows. It's a total mess! I think we should either start over, or at least try to salvage the .svg maps and make clear that they are based on the 2005-2006 Eurobarometer dataset, and that they shouldn't be "updated" or "expanded", but kept the way they are for the 2005 data they show. Cheers, Nederlandse Leeuw (talk) 05:18, 17 August 2024 (UTC)
- PS: https://europa.eu/eurobarometer/surveys/detail/518 has the original PDF file of the Special Eurobarometer 243: Europeans and their languages (6.78 MB - PDF) - EN (Fieldwork: November – December 2005; Publication: February 2006). I'm reading it now. Nederlandse Leeuw (talk) 05:35, 17 August 2024 (UTC)
- The dataset seems to be question D48T Which languages do you speak well enough in order to be able to have a conversation * TOTAL" on pp. 152-154 of the PDF file. These data should be underlying all 9 maps that I've listed above. If so, then I think a map can be salvaged, but otherwise it might be better to start over with a new map for that language based on this dataset. Nederlandse Leeuw (talk) 05:58, 17 August 2024 (UTC)
- Okay I think I know why mapmakers have decided to mark certain countries and areas as "Native" instead of giving a percentage, because the question D48T excludes mother tongues (native languages). That's why you see that the Republic of Ireland has only 5% of respondents saying they can speak English, which probably means that Irish Gaelic was their native language, or they are immigrants who acquired citizenship and learnt English as a second language, but c. 95% of respondents probably has English as their native language. Same goes for 7% of UK respondents saying they can speak English as a second language, which probably means over 90% of UK respondents has English as their first language. The problem is: the dataset does not actually say how many native speakers of English the UK and Republic of Ireland have. We assume that every citizen of those countries either speaks English natively, or has learnt it as a second (or third, fourth etc.) language, because English is a mandatory subject in schools for natives, and English proficiency is a requirement for citizenship for immigrants, but we might have overlooked other possibilities, such as poor education or diminished proficiency.
- I'm not sure how we can resolve this issue. Just labelling an entire country or region "native" is quite problematic, especially with shifting demographics or complicated linguistic geography in countries such as Belgium and Switzerland. But showing Ireland and the UK as English 5% and 7%, respectively, isn't really an option either... Nederlandse Leeuw (talk) 06:19, 17 August 2024 (UTC)
- One more thing to take into account is native English speakers who were EU citizens living outside the UK and Ireland; they are not counted amongst those respondents in those countries, because they wouldn't indicate that English was their second language. So hypothetically speaking, if 1% of respondents in Belgium were native speakers of English, it would push the total knowledge of English in Belgium to 60% instead of 59%, which would require giving Belgium a darker shade of green. But right now, we're not counting them. I think this is a pretty strong argument not to count native speakers in Ireland and the UK either, because they are not recorded in the source either.
- This survey is about non-native speakers of English, so colouring Ireland and the UK 'Native' simply misses the point. It is original research / synthesis, which should not happen in mapping if we want them to be useful for Wikipedia; see Commons:Evidence-based mapping#English Wikipedia precedents. Therefore, I think we need to rethink the entire conception of these maps, and start over. I'm gathering some data and thinking of a reasonable colouring scheme / legend, file name, description, etc. Nederlandse Leeuw (talk) 07:34, 17 August 2024 (UTC)
- Question What is your mother tongue? (SPONTANEOUS – MULTIPLE ANSWERS POSSIBLE) (pp. 141 to 143) contains an overview of native languages (mother tongues) spoken by EU citizens by EU member state, or EU member state candidate. p. 7 (p. 8 of PDF file) contains a handy overview of how native languages of EU citizens by (candidate) member state correlate with the State Language(s), official languages that have an official status in the EU. These two datasets are very important, but also easy to mis-interpret. For example, some people were raised bilingually, so the totals of mother tongues regularly exceed 100%. We shouldn't assume that people had only 1 mother tongue, and that if it was not A, it therefore had to be B, or something. Moreover, p. 141 indicates that 1% of EU citizens in Belgium had Spanish as their mother tongue, but this gets grouped on p. 7 to the 5% in the column Other official EU languages. On the other hand, p. 152 indicates that 6% of EU citizens in Belgium can speak Spanish as a second language, thus the overall total of Spanish speakers in Belgium was 7%, not 6%.
- p. 7 + 154 also show that there is indeed a bit of a gap between native speakers of English and second-language speakers of English in the UK and Ireland. The UK has 92% mother tongue English speakers and 7% second-language speakers, leaving 1% unaccounted for; Ireland had 94% and 5%, respectively, also leaving 1% unaccounted for. This could be due to rounding issues, or a genuine gap of about 1% of the population who could not speak English for whatever reason. In other cases, this gap is undeniable; 82% of Estonian (EE) citizens spoke Estonian natively, and 14% as a second language, leaving a gap of 4% who didn't speak Estonian at all. There is also a 4% gap in Latvia of non-Latvian speakers. In both cases, this probably mostly represented the small minority of monolingual Russian speakers (the majority of Russian mother tongue speakers in both countries, 17% and 26% respectively, did indicate they could speak Estonian and Latvian, respectively).
- If we are very careful, we could combine the mother tongue and second language total datasets to represent overall language proficiency by state. Per en:WP:CALC, we are allowed to make simple calculations like that, based on compatible datasets (and these are from within the same source, so these data are compatible). But I think the relevant data should be stated explicitly in the description of each map, with references to the relevant pages of the Special Eurobarometer 243, so that everyone can verify these data themselves. Nederlandse Leeuw (talk) 09:38, 18 August 2024 (UTC)
- The dataset seems to be question D48T Which languages do you speak well enough in order to be able to have a conversation * TOTAL" on pp. 152-154 of the PDF file. These data should be underlying all 9 maps that I've listed above. If so, then I think a map can be salvaged, but otherwise it might be better to start over with a new map for that language based on this dataset. Nederlandse Leeuw (talk) 05:58, 17 August 2024 (UTC)
- PS: https://europa.eu/eurobarometer/surveys/detail/518 has the original PDF file of the Special Eurobarometer 243: Europeans and their languages (6.78 MB - PDF) - EN (Fieldwork: November – December 2005; Publication: February 2006). I'm reading it now. Nederlandse Leeuw (talk) 05:35, 17 August 2024 (UTC)
- Thanks for that!
- File:EF English Proficiency Index 2019 Europe.svg. Done. Let's just use this map instead. Nederlandse Leeuw (talk) 18:49, 15 August 2024 (UTC)
Combined dataset Special Eurobarometer 243
[edit]Alright, that leads to the following table. I'm just gathering the data here since we're talking about it here.
Code | Country | German | English | Spanish | French | Italian | Russian |
---|---|---|---|---|---|---|---|
EU25 | European Union 25 | 32 | 52 | 15 | 36 | 16 | 7 |
BE | Belgium | 27 | 59 | 7 | 86 | 5 | 0 |
CZ | Czech Republic | 28 | 24 | 0 | 2 | 1 | 20 |
DK | Denmark | 58 | 86 | 5 | 12 | 1 | 1 |
DE | Germany | 99 | 56 | 4 | 15 | 3 | 11 |
EE | Estonia | 22 | 46 | 0 | 1 | - | 83 |
EL | Greece | 9 | 48 | 1 | 8 | 4 | 3 |
ES | Spain | 2 | 27 | 99 | 12 | 3 | 1 |
FR | France | 8 | 37 | 14 | 99 | 7 | 0 |
IE | Ireland | 7 | 99 | 4 | 20 | 1 | 1 |
IT | Italy | 7 | 31 | 4 | 14 | 96 | 0 |
CY | Cyprus | 5 | 77 | 2 | 13 | 4 | 2 |
LV | Latvia | 19 | 39 | 0 | 1 | 0 | 96 |
LT | Lithuania | 14 | 32 | 1 | 2 | 0 | 87 |
LU | Luxembourg | 92 | 61 | 2 | 96 | 6 | 0 |
HU | Hungary | 26 | 23 | 1 | 2 | 2 | 8 |
MT | Malta | 3 | 90 | 2 | 17 | 66 | - |
NL | Netherlands | 71 | 88 | 5 | 29 | 1 | 0 |
AT | Austria | 100 | 58 | 4 | 10 | 8 | 2 |
PL | Poland | 20 | 29 | 1 | 3 | 1 | 26 |
PT | Portugal | 3 | 32 | 9 | 24 | 1 | 1 |
SI | Slovenia | 50 | 57 | 2 | 4 | 15 | 2 |
SK | Slovakia | 32 | 32 | 1 | 2 | 1 | 29 |
FI | Finland | 18 | 63 | 2 | 3 | 2 | 2 |
SE | Sweden | 31 | 89 | 6 | 11 | 2 | 1 |
UK | United Kingdom | 9 | 99 | 8 | 23 | 2 | 1 |
(Countries that were EU candidates in 2005/2006) | |||||||
BG | Bulgaria | 12 | 23 | 2 | 9 | 1 | 35 |
HR | Croatia | 34 | 49 | 2 | 4 | 14 | 4 |
RO | Romania | 7 | 29 | 3 | 24 | 4 | 4 |
TR | Turkey | 4 | 17 | 0 | 2 | 0 | 1 |
- ↑ Directorate-General for Education and Culture (February 2006) Special Eurobarometer 243: Europeans and their Languages. (Fieldwork: November – December 2005. Publication: February 2006)., City of Brussels: European Commission, pp. 141–143, 152–154
I'll fill in the columns based on the Excelsheet I've made offline. Then we can start comparing the data and discuss a possible colouring scheme. Personally, I find the percentages rather random. Why "Native, 80%>, 50–79%, 30–49%, 20–29%"? As shown, native speakers of Latvian in Latvia were at 73% (so it wouldn't even fall in the "80%>" shade), but the total was still 96%. "native" is just not important if we want to know the total proficiency, which is native + non-native (or as the report would call it, "mother tongue" + "total languages other than mother tongue"). Austria had even more native speakers of German than Germany itself, and Ireland more English native speakers than the UK (which I don't find surprising btw; it's one of these demographics things you might expect). I'm leaning towards a more gradual shading with steps of 20% or even 10%. The difference between 30% and 49% is huge; it's sort of misleading to literally paint all these countries with one broad brush. Nederlandse Leeuw (talk) 11:13, 18 August 2024 (UTC)
- It's worth noting that Polish was spoken natively by 9% of EU citizens (almost all in Poland) and non-natively by about 1% of EU citizens (most of them in Lithuania), so 10% in total. That's actually more than Russian (1% natively, 6% non-natively), but because Polish is so strongly concentrated in Poland and a little bit in Lithuania, it wouldn't make much sense to map it; it would look very boring, even though statistically it would make sense. Nederlandse Leeuw (talk) 11:52, 18 August 2024 (UTC)
- Thanks a lot for all this info and these insights including on the complexities and challenges / remaining issues involved! Maybe it would be good to add some info on this to the file description so people can go here to find this further info and details and/or to include that table in collapsed form in the file description.
- I saw your Commons:Evidence-based mapping earlier and would like to add a link to it which you could remove if you don't find it useful. Got to say I'm a bit overwhelmed by all the info you provided here and can't give you much feedback on that but I think it would be best to (also) have an as complete map as possible that also considers native English speakers. When it comes to evidence-based mapping the issue is not that people here do calculations or whether the data is retrieved in full from some credible source but 1) whether or not the data is provided 2) whether or not the data is credible and withstanding scrutiny (and this isn't only about factual yes-or-no accuracy but also about whether it's misleading/very incomplete).
- I'm more interested in what to do about all those file uses of the other file – could you replace them all with this or if possible such a more complete map? A tool to mass-replace a specific file would be useful and this proposal of mine at the Community Wishlist is relevant. Prototyperspective (talk) 11:57, 18 August 2024 (UTC)
- @Prototyperspective You're welcome! This is one of these complicated issues that really take time to resolve, but once you tagged me here and asked me follow-up questions, I decided to really delve into it, otherwise we would never clean up this mess. Indeed, I intend to include either the entire table above into the file description of each map, or just the relevant data per language.
- Yes, feel free to add a link to my essay! Haha, sorry to overwhelm you with feedback, I tend to have that effect on people. I'm basically thinking out loud here, both for you and anyone else who might be reading along (right now or later), as well as a very helpful note-to-self, otherwise I would have way too much information to remember!
- I agree with you that providing the data and ensuring accurate data is of primary importance. But retrieving the data in full (or in abbreviated form) is the best way to help the verification process (both for yourself and for others exploring your work); and the question whether we can trust home-made calculations when combining data, leading to composite data not found in the reliable source material, is important to prevent en:WP:SYNTH (or at least the accusation of SYNTH). What seems to have happened with several of these maps, including this one, is the combination of data from different, incompatible datasets. E.g. we shouldn't be mixing up the Special Eurobarometer 243 (2006) with the Special Eurobarometer 386 (2012) or the Special Eurobarometer 540 (2023), let alone with Statista.com/Klazz 2019 (or any other year), or with the EF English Proficiency Index 2019 (or any other year). Because if we did, we would almost certainly be comparing apples and oranges. Neither should we arbitrarily add or remove countries from datasets. EU membership has obviously shifted over the years, and lots of editors seem unable to properly deal with that fact in relation to past data, thinking we need to exclude the UK from pre-2020 sources, or add Croatia to pre-2013 data from different sources etc. With this messy group of maps in particular, I think we can hardly be too careful in stating clearly where our data comes from, except that we shouldn't overwhelm our readers and fellow editors with data either ().
- What to do with the file uses is indeed a tough question. In Commons:Evidence-based mapping#Commons recommendations, especially It is recommended to make maps language-neutral..., I've outlined all the issues with updating and correcting maps that are already widely used across multiple languages. If I single-handedly corrected it all according to the datasets of Special Eurobarometer 243 (2006) pp. 141–143. 152–154, adding all 29 countries therein and my own new colouring scheme in steps of perhaps 10% or 20% per colour, lots of crosswiki map legends / captions would no longer work. But if I created a brand new map that does everything correctly from the start, I would have to manually replace it everywhere, as well as probably request a file rename to eliminate the ambiguities in "Knowledge of English EU map" etc. that have caused so much confusion ever since these maps were originally created in the years 2006 to 2010. So, I haven't yet made a decision on whether to try and fix all maps, or create new maps from scratch and replace them all manually. (For now, I'm looking at how to clean up the info at en:Languages of the European Union#Knowledge as a first step). What would you do if you were in my position? Cheers, Nederlandse Leeuw (talk) 12:45, 18 August 2024 (UTC)
- PS: Hmmm, apparently only de:Europäische Union#Sprachen provided the wrong year (2010) instead of 2006. By far, almost all file uses just say "English knowledge in (the) Europe(an Union)" or somesuch, not mentioning a year, nor providing a legend with percentages, perhaps because that legend has already been visually embedded in the map itself (between Iceland and Ireland). So the risk of legends or captions no longer 'working' is overblown. We can definitely repair and rename these maps (at least this one), and introduce a new colouring scheme, without causing crosswiki problems. Nederlandse Leeuw (talk) 13:09, 18 August 2024 (UTC)
Colouring scheme
[edit]Alright, let's have a separate section for a possible colouring scheme. The current colours are as follows (Scheme A):
How about a 20%-step scheme? (Scheme B):
A 10%-step scheme allows for more nuance, but makes colours less distinguishable, unless we add more colours instead of shades of the same colour.
There is no rule against that, and for people with various colour impairments, it would be better, so let's see. (Scheme C):
I think these colours are far more distinguishable, but I'm not sure it would make the map look much better, especially next to other maps that use yellow (Spanish) or blue (French). Nederlandse Leeuw (talk) 13:52, 18 August 2024 (UTC)
I'm thinking about an alternate 20%-step scheme with all-green colours, but better distinction. (Scheme D):
This has my preference so far. Nederlandse Leeuw (talk) 14:05, 18 August 2024 (UTC)
- Hmmm I don't really like the results. I guess Scheme B is still the best, although it will change a lot of colours compared to current Scheme A, and it might not really improve insight, as the two middle groups "20–40%" and "40–60%" are very large. We might want to consider splitting them up in four groups of 10% each. The last group, "0–20%", has only 1 member: Turkey (17%).
- Meanwhile, I've updated the description, hopefully it is much clearer now what is the source, the scope, and the fact that it is not supposed to be updated or expanded or anything based on other data. It's just a matter of deciding on a reasonable colouring scheme to go with the data. We could make separate maps for the Eurobarometers of 2012 and 2023 already, that would already clear up a lot of confusion. Nederlandse Leeuw (talk) 15:41, 18 August 2024 (UTC)
- I guess Scheme B will win out. It's the best way to make everything consistent across all maps of all three Eurobarometers (243, 386, and 540): steps of 20% each time, instead of all this arbitrary stuff. Hopefully, the shades are easy enough to spot for people with visual impairments; both Schemes C and D are too radical, and don't really solve the issues.
- The main thing to figure out now is the text (containing the percentages) next to the embedded legend in Inkscape; I just can't get that to work. Whenever I upload that, the letters don't show... Nederlandse Leeuw (talk) 22:39, 18 August 2024 (UTC)