Building AI with MongoDB: How Devnagri Brings the Internet to 1.3 Billion People with Machine Translations

Mat Keep
January 23, 2024 | Updated: January 30, 2025
#genAI #Vector Search

It was while on a trip to Japan that Himanshu Sharma — later to become CEO of Devnagri — made an observation that drew parallels with his native India. Despite the majority of Japan’s population not speaking English, they were still well served by an internet that was largely based on the English language. Key to doing this was translation, and specifically the early days of automated machine translation. And so the idea to found Devnagri, India’s first AI-powered translation platform, was born.

“In India, 90% of the population are not fluent in English. That is close to 1.3 billion people. We wanted to bridge this gap to make it easy for non-English speakers to access the internet in their native languages. There are more than 22 Indian languages in use, but they represent just 0.1% of data on the internet,” says Sharma.

“We want to give people the same access to knowledge and education in their native languages so that they can be part of the digital ecosystem. We wanted to help businesses and the government reach real people who were not online because of the language barrier.”

Check out our AI Learning Hub to learn more about building AI-powered apps with MongoDB.

Figure 1: Devnagri’s real time translation engine helps over 100 Indian brands connect with their customers over digital channels for the first time

Building India’s first machine translation platform

Sharma and his team at Devnagri have developed an AI-powered translation platform that can accept multiple file formats from different industry domains. Conceptually it is similar to Google Translate. Rather than a general consumer tool, it focuses on the four key industries that together make the largest impact on the everyday lives of Indian citizens: e-learning, banking, e-commerce, and media publishing. Devnagri provides API access to its platform and a plug-and-play solution for dynamically translating applications and websites.

As Sharma explains, “Our platform is built on our own custom transformer model based on the MarianNMT neural machine translation framework. We train on corpuses of content in documents, chunking them into sentences and storing them in MongoDB Atlas. We use in-context learning for training, which is further augmented with reinforcement learning from human feedback (RLHF) to further tune for precise accuracy.”

Sharma goes on to say, “We run on Google Vertex AI, which handles our MLops pipeline across both model training as well as inferencing. We use Google Tensor Processing Units (TPUs) to host our models so we can translate content — such as web pages, PDFs, documentation, web and mobile apps, images, and more — for users on the fly in real-time.”

While the custom transformer-based models have served the company well, recent advancements in off-the-shelf models is leading Devnagri’s engineers to switch. They are evaluating a move to OpenAI GPT-4 and the Llama-2-7b foundation models, fine-tuned with the past four years of machine translation data captured by Devnagri.

Why MongoDB? Flexibility and performance

MongoDB is used as the database platform for Devnagri’s machine translation models. For each sentence chunk, MongoDB stores the source English language version, the machine translation, and if applicable, the human-verified sentence translation.

As Sharma explains, “We use the sentences stored in MongoDB to train our models and support real-time inference. The flexibility of its document data model made MongoDB an ideal fit to store the diversity of structured and unstructured content and features our ML models translate.”

We also exploit MongoDB’s scalable distributed architecture. This allows our models to parallelize read and write requests across multiple nodes in the cloud, dramatically improving training and inference throughput. We get faster time to market with higher quality results by using MongoDB.
Himanshu Sharma, Devnagri co-founder and CEO

What's next?

Today Devnagri serves over 100 brands and several government agencies in India. The company has also joined MongoDB’s AI Innovators Program. The program provides its data science team with access to free Atlas credits to support further machine translation experiments and development, along with access to technical guidance and best practices.

Head over to our quick-start guide to get started with Atlas Vector Search today.

← Previous

Pledging Ourselves to the Future

As MongoDB’s sustainability manager, you could say I think about the climate a lot. After all, doing so is my job. But because it’s January and a time of reflection, I’ve been thinking about climate change more than usual — particularly about the progress we’ve made, but also the work that remains to be done. For example, in December the annual U.N. Climate Change Conference (COP 28) ended with a landmark agreement to transition away from fossil fuels, and the aim of reaching net zero carbon dioxide emissions by 2050. The COP 28 agreement also calls on countries to triple their renewable energy capacity and reduce other forms of emissions. The agreement was very welcome because before COP 28 began the U.N. released a stark report that showed national plans are, "insufficient to limit global temperature rise." As worried as I might be some days, I’m also buoyed by the climate action of the last few years. According to the U.S. Energy Information Administration, in 2022 more energy was generated by renewable sources than by coal for the first time. There have also been several regulations passed globally that make the measurement and disclosure of emissions mandatory, a key step in understanding — and reducing — emissions. MongoDB joins The Climate Pledge In the same spirit of optimism, I’m delighted to announce that MongoDB recently signed The Climate Pledge joining hundreds of leading organizations in publicly affirming our commitment to sustainability. The Climate Pledge’s hundreds of signatories commit to regularly report on their emissions and reach net-zero emissions by 2040 through decarbonization strategies and carbon offsets. “We’re thrilled to join the world’s leading companies — like MongoDB customers Verizon and Telefónica — in signing The Climate Pledge,” said MongoDB chief product officer, Sahir Azam. “MongoDB looks forward to working with the Climate Pledge team to ensure a more sustainable future for everyone.” Signing the The Climate Pledge is hardly the first step MongoDB has taken toward ensuring a more sustainable future. In 2023, MongoDB committed to being 100% powered by renewable energy by 2026, and achieving net-zero carbon emissions by 2030. To meet those targets, we’re working to reduce our carbon footprint through product innovation, by adding new sources of renewable energy, and by making MongoDB employees’ commutes more sustainable. Goodbye waste, hello (energy) savings In 2023, we also announced MongoDB’s new Sustainable Procurement Policy , which aims to ensure that sustainability is considered at all levels of our supply chain. The policy covers everything from the coffee we purchase (100% sustainably sourced) to the single-use items we use (restrictions leading to a 58% waste reduction in 2023). How MongoDB’s workloads are powered falls under our sustainable procurement efforts. Specifically, we’re currently working with our cloud partners — all of whom share MongoDB’s aim to be 100% powered by renewable energy by 2026 — to reduce our carbon footprint. "MongoDB takes its commitment to carbon reduction seriously, and we're fortunate to work with partners who share our enthusiasm for sustainability,” said MongoDB Lead Performance Engineer Ger Hartnett. “We look forward to continuing to collaborate with our partners on groundbreaking, energy-saving technology that makes real reductions in our carbon intensity." To meet our renewable energy target, we’ve focused our efforts on several areas, such as preferring buildings with renewable energy contracts or on-site solar when considering new office space. We’ve also entered into several virtual purchase power agreements (VPPAs). Virtual purchase power agreements are a great way for companies like MongoDB to invest in renewable energy without building anything on-site and are a proven method of adding renewable energy to the grid. Since 2022, MongoDB has worked with the enterprise sustainability platform Watershed to support renewable energy projects through VPPAs. Our first project helped build a solar plant in Texas that Watershed notes, “will avoid 13,000 tons of CO2, equivalent to taking nearly 3,000 gas-powered cars off the road each year.” And MongoDB recently signed a new VPPA that will support the development of solar panels for a factory in India. Solar energy is currently responsible for about 16% of global renewable energy, and only about 3.4% of overall energy in the U.S. Those numbers are sure to change, however. In the last fifteen years, global solar power generation has grown from 11.36 terawatt-hours to 1289.87 terawatt hours. What’s more, coal accounts for about 70% of India’s power — versus 20% in the United States — so projects like this will help reduce emissions across Asia. And because many MongoDB employees are directly impacted by air pollution in India , we see VPPAs as a way of benefitting the health and well-being of our employees, as well as the planet. MongoDB's stubborn optimism In the early months of the pandemic, Tom Rivett-Carnac, founding partner of Global Optimism — which launched The Climate Pledge with Amazon in 2019 — shared a video about shifting one’s mindset and changing the world . In the face of larger-than-life problems (like climate change), “stubborn optimism,” he said, “animates action, and infuses it with meaning.” “When the optimism leads to a determined action, then they can become self-sustaining … the two together can transform an entire issue and change the world,” he noted. “Stubborn optimism can fill our lives with meaning and purpose.” Composting is an example of a stubbornly optimistic action that’s both easy to adopt and one that (if enough of us do it) can change the world. Food waste accounts for 6% of global greenhouse emissions, and composting can help reduce those emissions. To put food waste emissions in perspective, 6% of global greenhouse emissions is roughly three times higher than annual global aviation emissions. In 2023, we also began tracking MongoDB’s waste and landfill diversion, and we’re working to improve how we dispose of waste by adding composting services to MongoDB’s hub offices. More than 80% of MongoDB’s offices already have composting services, and we aim to hit 100% in 2024. Not only have composting and single-use purchase reduction helped to decrease waste emissions, but both are highly visible to MongoDB employees. MongoDB employees are increasingly excited about sustainability, inspiring the creation of a mini-garden in our New York office, and the use of more sustainable commuting methods like biking. Though I tend to bike more for exercise than commuting these days (I’ve racked up more than 1,000 miles on my bike pass!), more and more MongoDB team members get to work in sustainable ways. For example, we’re rolling out electric vehicle commuting in India, an e-bike program was recently introduced in our Dublin office, and the bike locker in MongoDB’s New York HQ is generally packed. “I love biking to the office,” said Perry Taylor, a New York-based Information Technology Lead at MongoDB. “In addition to being a great way to stay fit, it’s awesome that how I commute helps the environment.” Looking back on 2023, I’m pleased with how much we accomplished toward MongoDB’s sustainability goals. At the same time, I recognize that more needs to be done. MongoDB enters 2024 with a renewed commitment to sustainability, and we look forward to furthering our progress. To learn more about MongoDB’s sustainability progress, please check out our Sustainability webpage and our latest Corporate Sustainability Report . For more information about fellow Climate Pledge signatories and an interactive timeline of progress made, visit The Climate Pledge .

January 23, 2024

Next →

Secure by Default: Mandatory MFA in MongoDB Atlas

On March 26, 2025, MongoDB will start rolling out mandatory multi-factor authentication (MFA) for MongoDB Atlas users. While MFA has long been supported in Atlas, it was previously optional. MongoDB is committed to delivering customers the highest level of security, and the introduction of mandatory MFA adds an extra layer of protection against unauthorized access to MongoDB Atlas. Note: MFA will require users to provide a second form of authentication, such as a one-time passcode or biometrics. To ensure a smooth transition, users are encouraged to set up their preferred MFA method in advance. This process should take around three minutes to set up. If MFA is not configured by March 26, 2025, users will need to enter a one-time password (OTP) sent to their registered email each time they log in. Why are we making MFA mandatory? Stealing users’ credentials is a key tactic in the modern cyberattack playbook. According to a Verizon report, stolen credentials have been involved in 31% of data breaches in the past decade, and credential stuffing is the most common attack type for web applications. 1 Credential stuffing is when attackers use stolen credentials obtained from a data breach on one service to attempt to log in to another service. These breaches are particularly harmful, taking an average of 292 days to detect and contain. 2 This rise in cyber threats has rendered password-only security inadequate. Organizations of all sizes trust MongoDB Atlas to safeguard their mission-critical applications and sensitive data. These range from global enterprises to individual developers. Therefore, to strengthen account security and to reduce the risk of unauthorized access, MongoDB is introducing mandatory MFA. The impact of MFA A large-scale study by Microsoft measured the effectiveness of MFA to prevent cyberattacks on enterprise accounts. The findings indicated enabling MFA reduces the risk of account compromise by 99.22%. For accounts with previously leaked credentials, MFA still lowered the risk by 98.56%. This makes MFA one of the most effective defenses against unauthorized access. By default, requiring MFA strengthens the security of all MongoDB Atlas accounts. By reducing the risk of compromised accounts being used in broader attacks, this proactive step protects individual users and enhances MongoDB Atlas’s overall security. Ensuring strong authentication practices across the Atlas ecosystem maintains the integrity of mission-critical applications and sensitive data— and a safer experience for everyone is the result. Preparing for mandatory MFA MFA will be a prerequisite for all users when logging into MongoDB services using Atlas credentials. These services include: MongoDB Atlas user interface MongoDB Support portal MongoDB University MongoDB Forums Atlas supports the following MFA methods: Security key or biometrics: FIDO2 (WebAuthn) compliant security keys (e.g., YubiKey ) or biometric authentication (e.g., Apple Touch ID or Windows Hello) One-time password (OTP) and push notifications: Provided through the Okta Verify app Authenticator apps: Such as Twilio Authy , Google Authenticator , or Microsoft Authenticator for generating time-based OTPs Email: For generating OTPs MongoDB encourages users to choose phishing-resistant MFA methods, such as security keys or biometrics. Strengthening security with mandatory MFA Requiring MFA is a significant step that enhances MongoDB Atlas’s default security. Multi-factor authentication protects users from credential-based attacks and unauthorized access. Making MFA’s additional layer of authentication mandatory ensures greater account security. This safeguards mission-critical applications and data. To ensure a smooth transition, users are encouraged to set up their preferred MFA method before March 26, 2025. For detailed setup instructions, refer to the MongoDB documentation . And, please visit the MongoDB security webpage and Trust Center to learn more about MongoDB’s commitment to security.

February 28, 2025