qwen-bg
max-ico04
2K
In
Out
max-ico02
Chat
max-ico03
disable
Textembedding-gecko-multilingual@001
Explore the textembedding-gecko-multilingual@001 model API, its architecture, training data, performance, and applications in NLP tasks.
Free $1 Tokens for New Members
Text to Speech
                                        const { OpenAI } = require('openai');

const main = async () => {
  const api = new OpenAI({ apiKey: '', baseURL: 'https://api.ai.cc/v1' });

  const text = 'Your text string goes here';
  const response = await api.embeddings.create({
    input: text,
    model: 'textembedding-gecko-multilingual@001',
  });
  const embedding = response.data[0].embedding;

  console.log(embedding);
};

main();            
                                
                                        import json
from openai import OpenAI


def main():
    client = OpenAI(
        base_url="https://api.ai.cc/v1",
        api_key="",
    )

    text = "Your text string goes here"

    response = client.embeddings.create(input=text, model="textembedding-gecko-multilingual@001")
    embedding = response.data[0].embedding

    print(json.dumps(embedding, indent=2))


main()   
Docs

One API 300+ AI Models

Save 20% on Costs & $1 Free Tokens
  • ico01-1
    AI Playground

    Test all API models in the sandbox environment before you integrate.

    We provide more than 300 models to integrate into your app.

    copy-img02img01
qwenmax-bg
img
Textembedding-gecko-multilingual@001

Product Detail

Google's textembedding-gecko-multilingual@001 Model

The textembedding-gecko-multilingual@001 model, launched by Google on April 30, 2024, represents a significant advancement in natural language processing (NLP). As a state-of-the-art text embedding model, it specializes in transforming diverse textual data into precise numerical vector representations, effectively capturing semantic meanings and relationships across numerous languages.

✨ Key Capabilities & Features

  • High Capacity: Supports up to 3,072 input tokens, allowing for comprehensive text analysis.
  • Vector Output: Generates detailed 768-dimensional vector embeddings, ideal for nuanced semantic understanding.
  • Benchmarked Excellence: Achieves superior performance on the Massive Text Embedding Benchmark (MTEB), setting new industry standards.
  • Innovative Training: Leverages a novel fine-tuning dataset (FRet) to enhance query and passage generation capabilities.
  • Multilingual Support: Engineered for broad language coverage, including Arabic, Bengali, Chinese, English, French, Hindi, and Spanish.

💡 Intended Applications

This versatile model is designed to empower a wide range of NLP applications:

  • 🔍Semantic Search: Improve relevance and accuracy in search results by understanding intent.
  • 🏷️Text Classification: Efficiently categorize documents and text snippets.
  • 📚Document Retrieval: Enhance the discovery of relevant information across large datasets.
  • 📊Clustering & Recommendation: Group similar items and provide personalized suggestions.
  • 🚨Outlier Detection: Identify anomalies or unusual patterns in textual data.

Technical Specifications

Architecture

The textembedding-gecko-multilingual@001 model utilizes a dense vector representation architecture, characteristic of advanced large language models (LLMs). It employs sophisticated deep learning methodologies to produce embeddings that accurately reflect the intricate semantic context of any input text.

Training Data & Diversity

Trained on a diverse dataset generated through a unique two-step LLM process, the model first generates queries and relevant passages, then ranks them to create a robust fine-tuning dataset. This ensures broad task coverage and enhanced performance. While diversity is a key design principle to mitigate biases, continuous evaluation is vital to address any emerging biases from the training data.

Knowledge Cutoff

The model's knowledge base is current as of April 2024, reflecting the latest information available at that time.

🚀 Unparalleled Performance Metrics

The textembedding-gecko-multilingual@001 model showcases exceptional performance, particularly on the Massive Text Embedding Benchmark (MTEB). This comprehensive benchmark evaluates models across seven categories and 56 datasets.

📊Average MTEB Score: 66.31 with 768-dimensional embeddings.

This outstanding score positions it as a market leader, outperforming models up to 7 times larger and those with higher-dimensional embeddings (up to 4096 dimensions), all while maintaining a compact size of just 1.2 billion parameters.

Task-Specific Excellence

The model demonstrates superior capabilities across core NLP tasks:

  • 🏷️Text Classification: 81.17
  • ↔️Semantic Textual Similarity: 85.06
  • 📝Summarization: 32.63
  • 🔎Retrieval Tasks: 55.70

Zero-Shot Generalization

A notable feature is its strong zero-shot generalization capability, especially when trained exclusively on the synthetic FRet dataset. This allows it to effectively adapt to unseen tasks without prior exposure to specific datasets, often outperforming various competitive baselines.

🛠️ How to Use & Access

Integration & Code Samples

The textembedding-gecko-multilingual@001 model is readily available on the AI/ML API platform. You can integrate it into your applications using the following code structure:

<snippet data-name="open-ai.embedding" data-model="textembedding-gecko-multilingual@001"></snippet>

For more details, visit the AI/ML API Platform.

Comprehensive API Documentation

Detailed guidance for integration and usage is available through the official API Documentation provided on the AI/ML API website.

🛡️ Ethical Use & Licensing

Ethical AI Guidelines

The development and deployment of textembedding-gecko-multilingual@001 adhere strictly to ethical AI principles. Developers are strongly encouraged to carefully consider the implications of using embedding models, especially concerning data privacy, security, and potential algorithmic biases in their applications.

Licensing Information

The textembedding-gecko-multilingual@001 model is not open-sourced. Its usage is governed by specific licensing agreements established by Google. Users must review the associated terms of service and privacy policies to ensure compliance.

❓ Frequently Asked Questions (FAQ)

Q1: What is textembedding-gecko-multilingual@001?

It's a state-of-the-art text embedding model developed by Google, designed to convert text into numerical vector representations that capture semantic meaning across multiple languages.

Q2: Which languages does the model support?

The model provides multilingual support for a wide range of languages, including but not limited to Arabic, Bengali, Chinese, English, French, Hindi, and Spanish.

Q3: How does it perform compared to other models?

It achieves an average score of 66.31 on the MTEB benchmark, outperforming larger models and those with higher-dimensional embeddings while being more compact.

Q4: What are the primary use cases for this model?

Its intended uses include semantic search, text classification, document retrieval, clustering, recommendation systems, and outlier detection.

Q5: Is textembedding-gecko-multilingual@001 an open-source model?

No, the model is not open-sourced. Its usage is subject to specific licensing agreements defined by Google, and users should review the terms of service.

Learn how you can transformyour company with AICC APIs

Discover how to revolutionize your business with AICC API! Unlock powerfultools to automate processes, enhance decision-making, and personalize customer experiences.
Contact sales
api-right-1
model-bg02-1

One API
300+ AI Models

Save 20% on Costs