Top Stories

The general AI model that Google's Gemini is not what we anticipated

 The general AI model that Google's Gemini is not what we anticipated


Gemini, Google's much-awaited next-generation general AI model, is now available. such as.


The Gemini Pro, which debuted this week, is simply a more lightweight variant of the more potent and capable Gemini model, which is anticipated to be released at some point in the next year. However, I'm getting carried away.


Members of the Google DeepMind team, which is responsible for both Google Research and Gemini, provided a high-level overview of Gemini (officially known as "Gemini 1.0") and its capabilities yesterday at a virtual press conference.


It turns out that Gemini is not just one AI model, but a family of them. It is available in three flavors:


The flagship Gemini model, Gemini Ultra, and the "lite" Gemini variant, Gemini Pro, are optimized to operate on mobile devices such as the Pixel 8 Pro.The Gemini Nano is available in two model sizes, Nano-1 (1.8 billion parameters) and Nano-2 (3.25 billion parameters), which are intended for low- and high-memory devices, respectively, adding to the confusion.


Bard, a rival to Google ChatGPT that runs a fantastic version of Gemini Pro right now, is the simplest location to sample Gemini Pro, at least in the United States. Only for words, not for graphics, and in English. During the conference, Sissy Hsiao, Google Assistant and General Manager of Bard, said that the improved Gemini Pro has superior planning, reasoning, and comprehending capabilities compared to the previous model powering Bard.


I should point out that none of those improvements can be independently verified by us. In reality, Google did not provide a live demo at the briefing and did not let journalists to test the models prior to their announcement.


On December 13, Gemini Pro will also go live for business users using Google's fully managed machine learning platform, Vertex AI. After that, it will transition to Google's developer suite, Generative AI Studio. (Previously, at Vertex AI's model garden, certain inquisitive users have seen Gemini model versions on exhibit.) In the next months, Gemini will also be accessible for other Google products, including one of Google's search engines, Chrome, Duet AI, and advertisements. will appear in part. productive encounter.


Meanwhile, Android developers who are interested in integrating the model into their applications may sign up now to get a look of the Gemini Nano, which will soon be available in preview via Google's freshly created AI Core app. For the time being, the software is only available on Android 14 on the Pixel 8 Pro. The Gemini Nano will enable features that Google teased during the Pixel 8 Pro reveal in October, such as summaries in the Recorder app and support for compatible messaging applications (beginning with WhatsApp), first on the Pixel 8 Pro and additional Android smartphones down the road. Recommended responses for.


essentially multimodal


Not much to brag about with the Gemini Pro, or at least the upgraded model that powers the Bard.


According to Hsiao, the Gemini Pro performs better on six benchmarks, including one that assesses grade school mathematical reasoning (GSM8K), than OpenAI's GPT-3.5, the GPT-4's predecessor. These tasks include summarizing, brainstorming, and content authoring. However, GPT-3.5 is already more over a year old, so it's not exactly a difficult milestone to reach.


How does the Gemini Ultra fare then? This ought to be more spectacular, right?


kind of.


Similar to Gemini Pro, Gemini Ultra underwent pre-training and fine-tuning on an extensive array of codebases, including text, audio, pictures, and video in many languages, in order to become "natively multimodal." Gemini Ultra is capable of comprehending "fine-grained" information in text, pictures, voice, and code, according to DeepMind vice president of product Ellie Collins. It can also respond to inquiries on "complex" subjects, including physics and mathematics.


In this regard, Gemini Ultra outperforms competitor OpenAI's GPT-4 in a number of areas with its multimodal model, Vision, which is limited to comprehending the context of words and pictures. In addition to translating speech, Gemini Ultra can also respond to inquiries concerning images and artwork as well as audio and video (such as "What's happening in this clip?").


Collins said during the briefing that "training separate components for different modalities is the standard approach to building multimodal models." These models perform well on certain tasks, such as describing a picture, but they really falter on tasks requiring more sophisticated conceptual understanding and sophisticated reasoning. Thus, Gemini is essentially multimodal by design.


Since I'm also intrigued, I wish I could tell you more about Gemini's training dataset. However, Google consistently declined to respond to inquiries from journalists on the methods used to get the training data from Gemini, the sources of the data, and whether or not any of the data was licensed from other sources.


Collins said that Google has "filtered" the data for "inappropriate" and high-quality material, with at least some of the information coming from open-source websites. However, they failed to address the most important issue, which is whether or not authors who unintentionally added to Gemini's training data may choose to opt out or demand payment.


It's not only Google that stores their training data. In addition to being a competitive advantage, data may also be the subject of fair use disputes. A number of generative AI providers, including Microsoft, GitHub, OpenAI, and Stability AI, have been sued for releasing their AI systems on copyrighted content, like as e-books and artwork, without giving the artists credit or money. has been charged with breaking IP law via training.


OpenAI has announced that it would enable artists to expand training datasets for their upcoming art-producing models, joining a number of other generative AI providers in this regard. For models or not, Google does not seem to provide a similar choice, and it appears that Gemini will not alter this stance.


Tensor Processing Units (TPUs) v4 and v5e (and v5p in the future) are Google's in-house AI chips, which are used to train Gemini. Gemini models are also being run on a mix of TPUs and GPUs. Gemini Pro took "a few weeks" to train (a technological whitepaper published this morning suggests that Gemini Ultra probably took longer). Collins said that Gemini is Google's "most efficient" big generator to date and that the AI model is "significantly cheaper than its multimodel predecessors," but he did not say how much the training of the model cost or how many chips were utilized, nor how the training affected the environment.


According to one source, training a model the size of GPT-4 would result in CO2 emissions of around 15 metric tons, or almost 1,000 American yearly emissions. Although it would have been ideal if Google had taken action to lessen the effects, who can tell given that the firm declined to discuss the matter—at least not at the briefing this writer attended—?


A somewhat improved model


Google demonstrated how Gemini may be used to solve worksheet issues step-by-step, identify possible errors in pre-filled responses, and assist with physics homework in a pre-recorded demo. As.


Another demo, which was pre-recorded, showed Gemini finding scientific publications pertaining to a certain issue set and obtaining data and formulae from those papers to recreate the chart using updated data. It displayed a chart that was being "updated" and created.


According to Collins, "You can think of the work here as an extension of the work that [DeepMind] started with the 'chain of thought inspiration,' which is that, with further instruction tuning, you Can get the model to follow [more complex] instructions." , If you consider an example of a physics assignment, you may provide the model with a picture along with guidelines to follow, such pointing out a math error in the homework. Thus, more complicated signals may be handled by the model."


Collins repeatedly praised Gemini Ultra's benchmark supremacy throughout the briefing, stating that "30 of the 32 widely used academic benchmarks used in large language model research and development" are current with the model. exceeds cutting-edge outcomes. However, it is evident from the findings that the Gemini Ultra performs somewhat better in several of those benchmarks than the GPT-4 and the GPT-4 with Vision.


Gemini Ultra, for instance, solves 94.4% of arithmetic problems correctly on GSM8K, compared to 92% on GPT-4. Gemini Ultra narrowly outperforms GPT-4 on the DROP reading comprehension assessment, 82.4% to 80.9%. Only 0.6 percentage points separates VQAv2 from GPT-4 with Gemini Vision on a "neural" picture comprehension test. Additionally, Gemini Ultra beats GPT-4 on the big-bench hard reasoning suite by a little 0.5 percentage point.


Collins reports that Gemini Ultra outperformed GPT-4 with Vision, achieving a "state-of-the-art" score of 59.4% on MMMU, the new standard for multimodal reasoning. However, Gemini Ultra really performs worse than GPT-4 on the Hellasvag test of common sense thinking, scoring 87.8% as opposed to GPT-4's 95.3%.


In response to a reporter's question on whether Gemini Ultra exhibits hallucinations, or the confident fabrication of facts, Collins said that the issue was "not a solved research problem." As you like, take it.


It is likely that prejudice and toxicity will exist even with Gemini Ultra, as even the most advanced generative AI models available today may respond negatively and destructively when given certain cues. Like previous general AI models, it is most likely Anglocentric. Collins said that although Gemini Ultra can translate between about 100 languages, no particular effort has been made to localize the model to nations in the Global South.


Another noteworthy restriction is that while the Gemini Ultra architecture (and, in principle, the Gemini Pro as well) permits the generation of images, that feature will not be included in the finalized model when it launches. This is most likely due to the fact that the mechanism is a little more intricate than ChatGPT's picture-generating process; although ChatGPT uses an image generator (the DALL-E 3 in this instance), Gemini creates images "natively" and without the need for any intermediate processes.


Collins just said that work is "ongoing" and did not provide a timeline for when picture production may be completed.


sprinted through the gate


The "launch" of Gemini this week seems a tad hastily put together.


Google claimed "impressive multimodal capabilities not seen in prior models" and "[efficiencies] in tool and API integration" with Gemini at its annual I/O developer conference. Additionally, Demis Hassabis, a partner at DeepMind and co-founder, said in a June interview with Wired that Gemini is bringing some new skills to the text-generating AI space, such problem-solving and planning abilities.


The Gemini Ultra could be able to do all of this and much more. However, considering Google's prior, more recent-generation AI issues, I would argue that yesterday's briefing should have been more comforting. It wasn't.


Since the start of the year, Google has gained popularity in the field of generative AI, after OpenAI and the company's viral hit ChatGPT. Bard was let off in February after receiving backlash for his incorrect responses to simple questions. Google staff members, particularly the ethics committee, had worries about the fast launch schedule.


Subsequently, rumors circulated that Google had used underpaid and overworked independent contractors from Appen and Accenture to annotate Bard's training data. Geminis may also be affected by this; Google did not refute this claim yesterday, and the tech whitepaper just states that annotators received "at least the local living wage."


To be fair, Google is making headway in that it has successfully added new generative AI-powered capabilities to dozens of its products, applications, and services, and Bard has improved dramatically since debut. which are driven by homegrown models such as the PaLM 2. Additionally, Imagen.


However, information seems to indicate that Gemini's growth has been problematic.


Gemini's performance has apparently been hindered by duties like consistently interpreting non-English questions, despite the fact that high-ranking Google officials, including Jeff Dean, the company's most senior AI research executive, purportedly had a direct role in the project. caused delays in the launch. Extreme. (Google states that only a limited number of customers, developers, partners, and "security and responsibility experts" will have access to Gemini Ultra. Developers and enterprise customers will then have access to it, and Bard will follow "early next year."). Collins said that not even all people are aware of Gemini Ultra's increased capabilities and that he hasn't developed a plan for Gemini's commercialization. (I suspect it will take long for this to happen, given the high cost of training and inference for AI models.)


Thus, the Gemini Pro remains our only option, and it's possible that the Gemini Ultra will perform worse, particularly if the model's reference window stays at the ~24,000 words specified in the technical whitepaper. (The text that the model takes into account before producing any further text is referred to as the context window.) That context window (around 100,000 words) is readily surpassed by GPT-4; however, the taken into We won't pass judgment until we have our hands on the model, so the window isn't everything.


Could the chaos of today's product releases be attributed to Google's marketing, which suggests that Gemini will be something extraordinary instead of just a minor tweak to the general AI needle? Maybe. Or maybe, even with a complete reorganization of your AI division to streamline the process, creating cutting-edge generative AI models remains a challenging task.



No comments: