2021 Localization Summit Session One: Emerging Threats & Unmet Needs

Summary

Session One of the Localization Summit took place on May 21 and focused on Emerging Threats and Unmet Needs. Fourteen language team leaders presented to a group of more than forty participants and observers about the challenges their language groups are facing, particularly in the context of the COVID-19 pandemic.

Emergent Themes

While the challenges facing the language groups represented in the thirteen presentations given at the Summit’s first meeting are context-specific and differ in important ways, there were some common themes that emerged from Friday’s discussion.

Challenges Exacerbated by the COVID-19 Pandemic

The pandemic and national responses to it largely worsened situations that were already bad. In many places, surveillance and monitoring increased, misinformation and hate speech spread, and independent media were attacked. Contact tracing methods introduced new threats to privacy in the name of public health and order. In place after place, these developments disproportionately affected social minority groups, including women, ethnic minorities, language minorities, disabled communities, and more.

Technical Language and Standardization

Standardization remains troublesome across language teams. Established glossaries to be used as standard reference are missing for many languages. Often this is because standardizing presents questions that are difficult to answer. How might we establish a common term that works in a language with more than 20 dialects? What should we use when the technical term is based on an English metaphor or reference (e.g. a “cloud” server) whose direct translations do not convey the same meaning, or do not retain the same language rules? (e.g. “troll” can be a noun or a verb in English, but the same is not true in Arabic) The lack of standardization for technical language presents challenges, but localizing the terms can be a very complicated task.

History Shaping the Present

A few presenters discussed the ways that colonial histories have shaped present contexts around language. It is important to understand these pasts when thinking about how and why English has become the primary online language, the way that language suppression has shaped cultural spaces both offline and online, and ways this might be disrupted in favor of promoting content that originates in non-hegemonic languages. It was noted that English colonizers were certainly not the only group participating in language suppression, and that it is an ongoing practice.

Prioritizing the Audience

Many presenters highlighted the need for developers to design for the communities they hope to reach. Usability remains an issue among many internet freedom projects, and not only with confusing apps. Resources should be designed in a way that promotes engagement. This means fewer wordy guides and more engaging experiences, whether by developing audiovisual materials, games, or other media. Additionally, content developers need to consider that many language communities are more in the speaking tradition than the written one, and in communities with lower reading & writing literacy rates, audiovisual content in their language is more likely to reach more people.

Community Sustainability

Language teams reported that they experienced challenges related to community sustainability. These challenges are not new but became more pronounced since pandemic-related lockdowns went into effect. Recruiting and retaining team members who will contribute over long periods of time is difficult, and the reasons can vary from language to language and team to team. This is the topic of the second session of the Summit, where participants will dive deeper into questions around what it will take to ensure that these communities thrive moving forward.

Language Team Presentations

Thai Language Team

Presentation Slides

Challenges and Emerging Threats

Thailand is facing a series of human rights challenges stemming both from political upheaval and the COVID-19 pandemic. The country experienced demonstrations large and small throughout 2020, which led to a nationwide emergency decree in March of that year. This expanded enforcement and surveillance powers in response to the wide scale protests. While this was going on, public health measures increased the risk of privacy violations through the widespread collection of personal data, both digital and not digital.

A slide from the Thai Language Team’s presentation.

A slide from the Thai Language Team’s presentation.

It’s widely expected that once the coronavirus situation improves, large scale protests will resume. The new Minister of Digital Economy has called for the arrest of alleged lese-majeste offenders (those who are said to have insulted the monarch), both at home and abroad. Even criticizing a vaccination plan may result in a charge. In support of this thinking, a right-wing group is physically pursuing internet users critical to the monarchy, intimidating them into confessions and public apologies.

Needs

Our Thai language coordinator stressed that the democracy movement needs a chat app with a reliable connection and a collaboration function. Currently they are using Telegram, particularly its voting function, but the connection is unreliable at a large protest and the app is an open one, so monitors may be lurking in a large group chat.

Beyond the protesters, the Thai language community has several specific needs. These include technical glossaries, especially on encryption, cryptography, and network protocols (Signal and MobileCoin, too). They also need wider cross-project translation memory, especially with projects outside OTF. A more open localization platform would be beneficial, and exploring something like Mozilla’s Pontoon would be useful. Finally, the team would like to take on projects beyond localizing apps. There is a particular need to localize websites and help & support online resource centers.

Spanish Language Team (Latin America)

Presentation Slides

Challenges & Emerging Threats

The Spanish language team highlighted the growth of digital rights violations in the Latin America region in recent years, and especially since the beginning of the COVID-19 pandemic. The protests that enveloped Bolivia, Chile, Peru, Ecuador and elsewhere in 2019, which came to be known as the Latin American Spring, were followed immediately by lockdowns. The backlash from the protests and the lockdowns themselves put women, LGBTQ+, indigenous, disability, and other minority communities at much greater risk of harm.

A slide from the Spanish Language Team’s presentation.

A slide from the Spanish Language Team’s presentation.

Needs

The team identified several needs, all of which centered around prioritizing the audience. There should be diverse communication strategies presented in different formats so content is more easily accessible to those with little technical familiarity. Project leaders should collaborate directly with educators so they can more easily reach audiences of varied literacy levels, and so they can include examples that resonate with those of different lived experiences and different risk profiles. Projects should prioritize usability and good user interface and user experience (UI/UX) design (TunnelBear was used as an example of good UI/UX).

Finally, projects need to include the communities in their work. They should pilot test solutions directly with end users across a broad and diverse landscape, and should formally train professionals in digital security, privacy and free & open source software, especially civil society organizations and minority groups.

Shona Language Team

Presentation Slides

Challenges & Emerging Threats

Our Shona language community in Zimbabwe faces deeply structural challenges, both physical and social. Technical infrastructure is both inadequate and concentrated. While both women and rural dwellers are the majority demographics (52% and 60.2% of Zimbabwe’s population, respectively), both struggle for access to mobile devices, computers, financial services, and internet access. Data costs are prohibitively high and climbed during the COVID-19 pandemic, and access to digital literacy and skills training is limited.

Since the beginning of the pandemic, surveillance has increased in Zimbabwe, media workers have been targeted, harassed, assaulted, and arrested, misinformation has spread, repressive laws have been proposed, and the state has taken ownership of media platforms. Costs have climbed, as has unemployment, and this has disproportionately affected minority groups who are at greater risk of being outed and of experiencing violence in the home.

The response to COVID-19 has also introduced new privacy and data risks. As manual contact tracing has become common in Zimbabwe, more establishments are requiring patrons to leave their contact details, which means there is greater potential for data breaches.

Shona1.png
Slides from the Shona Language Team’s presentation.

Slides from the Shona Language Team’s presentation.

Needs

There is currently no standardized technical terminology in Shona. This is primarily because there are no linguists currently doing this localization work as part of a public service. This challenge is exacerbated by the diverse and numerous Shona dialects, large variances between which make translation difficult. A lack of funding has been a barrier to community engagement, especially as unemployment has risen during the pandemic. In some cases the physical distances between communities with people who are interested in doing this work has also been hard to overcome.

Hindi and Telugu Language Teams

Presentation Slides

Historical Context

Our Hindi and Telugu language coordinator contextualized the team’s modern challenges in a striking and impactful way. India’s colonial history shapes its current context in ways that are both obvious and subtle, and this extends to language. The country’s brutal history of language suppression has elevated English to primacy at the expense of local languages. Even as English could be considered a global minority language by numbers (especially when considering how many people speak it as a primary versus a secondary language), it remains the primary language online in India and around the world.

A slide from the Hindi & Telugu Language Team’s presentation. Source

A slide from the Hindi & Telugu Language Team’s presentation. Source

Modern Challenges

Technology development has largely been controlled by the West, which has meant that English has become further cemented as the primary language. While technology transfer from Global North to Global South has occurred, language did not keep up with newly developed concepts, tools, and features. This has meant that most languages do not have references for these new technical words in English. Because there are no references, there are no translations, and no localized technical glossaries for reference.

Another slide from the Hindi & Telugu Language Team’s presentation.

Another slide from the Hindi & Telugu Language Team’s presentation.

Arabic Language Team

Localization Challenges

The Arabic language team has regularly met with localization challenges around localizing technical terms from English. Many modern technical terms rely on metaphors or cultural references that are Western-centric and that do not translate easily to other contexts. This can get complicated. In one example, trainees did not know what a “cloud” server referred to, and were too shy to ask for clarification. “Troll” is an example where even near translations are incomplete. Contributors tried by translating it to fly (like a housefly, a pest). But “troll” in English is also a verb, and one cannot intuitively derive a verb from the noun “fly” in Arabic. Contributors ended up using “harass,” which still doesn’t quite capture the intent of baiting someone as a troll does.

A slide from the Arabic Language Team’s presentation.

A slide from the Arabic Language Team’s presentation.

Needs

Standardizing technical terminology has been difficult but is necessary for moving forward with Arabic localization projects. Our Arabic language coordinator stressed that there also needs to be improved collaboration between translators, including across languages, and between developers and translators. Establishing and improving those connections could boost the quality and ease of localization work across projects.

Russian Language Team

Localization Challenges

Our Russian language coordinator brought experience and observation from seeing what works and what doesn’t in digital security trainings. He emphasized that there are not enough tools and resources designed in ways that are effective or practical for their audiences. Guides that are text-heavy can sink users’ motivation to educate themselves. And resources typically grow outdated, especially during these pandemic times, where tool changes have been rapid but the manuals have not kept up.

Good design for users means minimizing text while still communicating important information in memorable ways. Multimedia can achieve this, as can the use of games and interactive audiovisual resources as teaching tools.

Good Examples

Our Russian language coordinator proposed the following as strong examples of good design for security training:

Igbo, Akan, Yoruba & Ga Language Team

Presentation Slides

Challenges and Emerging Threats

Because so many people in these four language communities communicate primarily via the spoken word, reading and writing proficiency rates are low, so sharing text-based resources can be ineffective. This can lead to a lack of effective community engagement when the resources are not available in formats that they can access. Additionally, issues of territorial integrity mean that violence could impact these communities suddenly and unexpectedly, and these challenges are anticipated in the coming year.

Needs

According to the language coordinator, audiovisual resources are far more effective at reaching these communities, but are not widely available in these languages. Localization sprints have been very helpful and more would be useful. Promoting indigenous languages and standardizing written language in resources like glossaries and literature could help expand literacy and make localization easier.

More than translation, though, these communities need in-house content developers who understand the challenges that face our community and who can develop content to address those challenges.

A slide from the Igbo, Aan, Yoruba & Ga Language Teams’ presentation.

A slide from the Igbo, Aan, Yoruba & Ga Language Teams’ presentation.

Khmer Language Team

Challenges and Emerging Threats

Security tools often still struggle with usability. For example, KeePassXC is often considered too complicated, which has led to potential users not adopting it and instead continuing poor password hygiene. With regard to the digital security localization community, it’s a small and somewhat insular one, so finding trusted people who can contribute on a long term basis is difficult. Technical terminology remains a problem, too, with a lack of standardization in Khmer and no established glossaries.

Additionally, there are challenges related to privacy and security for Khmer-speaking populations. All internet traffic in Cambodia will now be routed through a new National Internet Gateway controlled by the national government. Misinformation has spread on social media and people accusing the government of peddling falsehoods have been arrested. Digital contact tracing to slow COVID-19 spread, widely done by QR code, has introduced new privacy risks. There is no comprehensive legal protection of the right to digital privacy, and culturally, digital security remains a sensitive topic.

A slide from the Khmer Language Team’s presentation.

A slide from the Khmer Language Team’s presentation.

Needs

Cambodia needs strong legal protections for the right to privacy, especially digital privacy. Culturally, there needs to be greater public awareness of digital risks and threats, and better knowledge of digital security tools and methods (thankfully, there is beginning to be a shift from tools like Facebook Messenger toward tools like Telegram and Signal). And with regard to protest movements and campaigns, there need to be more trustworthy secure messaging platforms and channels available in Khmer.

Swahili Language Team

Presentation Slides

Challenges and Emerging Threats

Swahili is unfortunately practically invisible online, comprising only about 0.08% of content. There are not many technical glossaries available in Swahili, and among those that exist, there is a lack of harmonization between them. Additionally, very few resources are hosted locally. Accessing content hosted overseas (which is a huge proportion of the internet) means high latency, which in turn impacts adoption and deters the user experience.

A slide from the Swahili Language Team’s presentation.

A slide from the Swahili Language Team’s presentation.

Our Swahili language coordinator noted that in the contributor community, there is a real challenge with turnover and volunteer burnout, which is contributing to the sustainability of localization projects.

In terms of the COVID-19 pandemic, there are concerns about contact tracing apps, data collection, and privacy breaches. Additionally, governments are working to limit information shared about the coronavirus, especially information that challenges official government positions or narratives.

For more on the challenges facing Swahili-speaking communities on the internet, please see “Making Swahili visible: Identity, language, and the internet.”

Amharic Language Team

Challenges and Emerging Threats

There are low levels of digital literacy in Ethiopia and in the Amharic-speaking community. Still, social media is widely used, and the spread of hate speech and misinformation is rampant, leading to lots of violence in recent years. There is a lack of language-accessible materials to educate people about misinformation and hate speech.

This extends beyond the general public. Journalists are largely unaware about the existence of fact checking tools, and there may not be many available in Amharic to begin with. And as ethnic-affiliated media outlets with online followings in the millions spread hate speech and misinformation, there is real concern that the upcoming elections will bring violence and digital rights suppression, like an internet shutdown.

Needs

Our Amharic language coordinator highlighted the need for guides that Amharic speakers can read and share to improve their digital literacy, train them on fact checking, and improve their awareness around misinformation and hate speech, and they need them in Amharic. Ensuring the tools that help to avoid censorship and shutdowns, as well as those that can be used for fact checking, should be made available in Amharic, too.

Francophone Africa Language Team

Presentation Slides

Challenges and Emerging Threats

In Francophone Africa there is a lack of technical terminology in national and local languages (and in French, in some cases), as well as a lack of resources in those languages, especially educational resources for information and communication technologies (ICT). The practice of localization has been challenging, too. It has been difficult to maintain a motivated and dedicated community of contributors, and a lack of synergy between localization contributors and digital security experts has been a barrier. A lack of financial means has hindered our ability to produce the resources we need.

In the coming year, continued or reinforced lockdowns related to COVID-19 could lead to repression, including censorship. At the same time, military regimes in Mali and Chad are in a potential transition period, with elections scheduled soon in both places. Shutdowns or other disruptions may take place around those events.

Needs

These language communities need financial support to create and localize important resources. This would include the development of technical glossaries into community languages, and pedagogical resources for the most-used digital security tools in French-speaking countries.

They also need ways to connect security professionals with language teams and at-risk groups, like journalists, activists, and bloggers. This should include capacity building for digital security professionals, including trainers, to support the implementation of local community-driven projects. Additionally, more training should be made actively available to girls and women, journalists, human rights defenders, and political opponents or dissidents.

Francophone_Africa_1.png
Slides from the Francophone Africa Language Team’s presentation.

Slides from the Francophone Africa Language Team’s presentation.

Indian Minority Language Communities

Presentation Slides

Long-term Challenges

Keyboards have long been a challenge for minority languages in India. A virtual keyboard is not available for Kashmiri, and manual keyboards are hard to find for most minority languages there. Finding localized content is also difficult for languages with more speakers than readers or writers (this comes up in Assamese, Kashmiri, Konkani, and Manipuri). Technical glossaries are unavailable in Manipuri and Kashmiri.

Many languages don’t have an existing localization community to speak of. This is the case for Assamese, Konkani, Manipuri, and Kashmiri, all of which would need long-term support for internet access, contributors’ time, and more.

Needs

Some of the language communities live in areas of constant mass and targeted surveillance, so secure communication tools, anonymous browsers, VPNs and other circumvention tools that operate in their languages would be essential. Additionally, very few training manuals are available in minority languages in India.

The COVID-19 pandemic has only increased the need to have these tools in local languages, both to bypass censorship and to securely and privately organize protests.

Localization Summit 2021.png
Slides from the Indian Minority Languages Coordinator’s presentation.

Slides from the Indian Minority Languages Coordinator’s presentation.

Indonesian Language Team

Localization Challenges

It has been difficult for the Indonesian language team to maintain contributors following a sprint. This has included both paid and unpaid work. Additionally, communication with tool developers and content developers can be irregular and hard to establish. While comments on the Transifex platform are useful in documenting issues for other contributors, it is often ineffective for getting answers from developers, such as the context for a given source string. Being in direct touch in a different channel, like Signal or Mattermost, is better, and when possible communication should be established there.

COVID-related Challenges

Our coordinator mentioned that after lockdown, all education was shifted online, but people in rural areas often have trouble accessing educational materials because of a lack of connectivity. On top of that, the COVID tracing apps are ineffective and may not be trustworthy. There are several, they don’t seem to track movement from one city to another, and the government does not seem to be using the apps in any meaningful or effective way. It is assumed that the government is collecting and storing lots of data from these apps, but because of a lack of transparency around them, it is difficult to know what is being collected and how it is being used.

Participants & Attendees

Language Teams Represented

Aan

Amharic

Arabic

Assamese

French (Francophone Africa)

Ga

Hindi

Igbo

Indonesian

Kashmiri

Khmer

Konkani

Manipuri

Portuguese (Brazil)

Russian

Shona

Spanish (Latin America)

Swahili

Telugu

Thai

Tibetan

Yoruba

Community Partner Groups Represented

Center for Advancement of Rights and Democracy (CARD), Ethiopia

Center for Youth Empowerment and Leadership (CYEL), Kenya

Colnodo, Colombia

Electronic Frontier Foundation

EngageMedia

Front Line Defenders

Global Voices

ISC Project

Jokkolabs Banjul

Least Authority

Media Institute of Southern Africa, Zimbabwe

OKthanks

OONI

Open Society Foundations

Psiphon

Ranking Digital Rights

Rudi International

The Guardian Project

Thai Netizen Network

Tibet Action Institute