Google I see 2 the great landing in the sand of the video for AI

The field of media generation through artificial intelligence (AI) is evolving at a vertiginous pace, and video generation has become one of the most active and competitive borders. In this context, Google has presented I see 2, the evolution of its model I see 1 and its flagship proposal to compete in this emerging space. Developed by Google Deepmind, I see 2 is positioned as a latest generation model designed to produce high quality and realism videos, with the aim of offering an "unprecedented creative control."

The arrival of Vie 2 occurs in a moment of intense competition, with key actors such as OpenAi Sora, Runway, Kling and others promoting innovation at a remarkable speed. Google states that I see 2 redefines quality and control in the generation of video by AI, with the potential to significantly transform creative workflows into various industries.

This article enters you in a detailed analysis of Google I see 2. We examine its availability through the different Google platforms, its technical specifications and the key improvements with respect to its predecessor, I see 1. We also address the current limitations of the model, performing a comparative analysis with I see 1 and the relevant competitors, with opinions of initial experts and users, evaluating the Google approach to the Google Its development and deployment.

Accessing I see 2: platforms, prices and availability

Google's launching strategy to see 2 is characterized by a gradual and fragmented deployment. It began with private predicts for creators and selected filmmakers and has progressively expanded through various Google products and platforms. The key date has been the announcement of its availability on April 15, 2025 for Gemini Advanced users.

Currently, there are multiple routes to access I see 2, each with its own characteristics and limitations:

Gemini API / VERTEX AI: This is the main route for developers and business clients who seek to integrate I see 2 in their own applications. It is considered to be ready for production. Access requires API keys and, for certain advanced functions such as the specific chamber edition or controls, it may be necessary to be in a list of allowed users. Companies such as WPP, Agoda, Mondelez and Poe are already using or testing I see 2 through VERTEX AI.
Google ai Studio: offers an experimental environment for developers to test the capabilities of I see 2. The initial access is usually free, but is subject to very strict use quotas.
Videofx (Google Labs): It is an experimental tool aimed at creators, accessible through Google Labs. It requires registering in a waiting list. Initially, early access was restricted to users over 18 in the US, although Google plans to expand access.
Gemini Advanced: I see 2 is integrated as a function for subscribers of the Premium Google One AI Plan. It allows to generate 8 second videos at 720p resolution, with monthly use limits not explicitly defined (it is indicated that we will be notified when we are reaching the limit). It is globally available in countries and languages where Gemini Apps is supported.
Whisk Animate (Google Labs): This experimental function, also within Google Labs, uses I see 2 to convert static images into 8 -second animated video clips. It is available for subscribers of Google One AI Premium in more than 60 countries.
YouTube Shorts (Dream Screen): The integration of VAs 2 on YouTube Shorts is being implemented through the Dream Screen function. This will allow creators to generate unique video funds through AI or even create independent video clips from text prompts. The initial deployment will be made in the US, Canada, Australia and New Zealand.

As for the different prices, they vary significantly between these platforms:

API/VERTEX AI: The cost is based on the generated video time. The sources indicate prices between $ 0.35 - $ 0.50 per second. This is equivalent to $ 21- $ 30 per minute or $ 1260- $ 1800 per hour of generated video. There is Google launch mode has offered free credits ($ 300) and there could be initial periods of useless use in VERTEX AI.
Subscription: Access through Gemini Advanced and Whisk Animate is included in the subscription to Google One AI Premium ($ 20/month, € 21.99 in Spain). In comparison, Sora de Openai is offered as part of the subscriptions to Chatgpt Plus ($ 20/month) and Pro ($ 200/month).
Free/experimental: platforms such as Google AI Studio and Videofx (with waiting list) provide free access, but with great limitations in terms of quotas and functionalities available.

The following table summarizes the access roads to I see 2:

Table 1: Google access summary I see 2

Platform	Access method	Typical user	Key specifications (current access)	Cost model	Availability state
GEMINI API/VERTEX AI	API key, Allowlist (some function.)	Developer, company	4k/minutes potential, API: 720p/8s	Per second ($ 0.35- $ 0.50)	GA, Preview (Edit)
Google Ai Studio	Login	Developer	720p/8s	Free (low quotas)	Experimental
Videofx (Labs)	Login + wait	Creator	720p/8s	Free (low quotas)	List wait (reg.)
Gemini Advanced	Google One AI Premcription.	Consumer	720p/8s (16: 9)	Subscription ($ 20/month)	Ga (global)
Whisk animate (labs)	Google One AI Premcription.	Consumer, creator	Video image (8s)	Subscription ($ 20/month)	GA (60+ countries)
YouTube Shorts	Integrated in app	Content creator	Funds / Clips (8s?)	Free (integrated)	Deployment (reg.)

This diversity of access points and price models reveals a staggered access strategy by Google. The highest capacities (potentially 4K, longer videos, advanced controls) and the highest prices are reserved for business users and developers through the API, where the perceived value and the willingness to pay are greater. At the same time, more limited versions are offered (720p, 8 seconds) but more economically accessible to consumers and creators through free subscriptions or predicts. This segmented approach allows Google to manage the complexity of the deployment, high processing costs associated with video generation and maximize potential income, adapting to the needs of different market segments.

However, this price strategy places I see 2 in an interesting position in front of the competition. The high cost per second of the API ($ 0.35- $ 0.50) contrasts markedly with the inclusion of Sora in relatively affordable subscriptions of chatgpt ($ 20/$ 200 a month). Although Sora still does not have a widely available public API with defined prices, this fundamental difference in the access model could generate competitive pressure on Google prices. If OpenAI or other competitors offer APIs with lower unit costs, or if high quality models become accessible through cheaper subscriptions, professional users who need to generate large video volumes could find more attractive alternatives than the API of I see 2, potentially forcing Google to reconsider its price structure to maintain competitiveness in that key segment.

View 2 technical capabilities: a jump in the generative video

I see 2 operates mainly through two modalities: the generation of text to video (T2V), where a textual description is transformed into a video scene, and the generation of video to video (I2V), which encourages a static image based on an additional textual prompt to define style and movement. This model is the result of Google research years in video generation, taking advantage of architectures and learning of previous projects such as GQN, DVD-GAN, Image-Video, Phenaki, Walt, Videopoet and Lumiere, in addition to the Transformer Architecture and Gemini models.

As for the technical output specifications, I see 2 presents a significant advance, although with important nuances between its potential and current access:

Resolution: The base model is capable of generating video with a resolution of up to 4K.3 This is an improvement with respect to I see 1, which reached 1080p. However, many of the current implementations accessible to the public (API/VERTEX AI, AI Studio, Gemini Advanced, Videofx) are limited to 720p 14 or 1080p in some contexts.
Video duration: I see 2 has the ability to generate clips that "exceed the minute" or reach up to two minutes of continuous duration, and even potentially more. This improves the ability to see 1 (> 60s). However, current access through API, AI Studio and Gemini Advanced is frequently restricted to 8 second clips.
Fotograms rate (Frame Rate): The documentation of the API and VERTEX AI specifies a rate of 24 frames per second (FPS). In some comparison, 30-60 fps are mentioned.
Appearance ratio: Through the API/VERTEX AI, formats 16: 9 (landscape) and 9:16 (portrait) are supported. The departure in Gemini Advanced is 16: 9.
Output format: The MP4 format will be used for the outputs generated through Gemini Advanced.

Beyond the basic specifications, I see 2 introduces key qualitative improvements:

Video of the cut of a tomato generated by I see 2

Improved understanding and realism: the model demonstrates an advanced understanding of natural language and visual semantics, precisely interpreting tone, nuances and details of long prompts. Use Transformer architectures (possibly UL2 encoders) to process the text. Fundamentally, Google highlights the simulation of real world physical as a crucial improvement. Examples such as water physics, burn paper or the precise cut of a tomato without affecting the fingers illustrate this capacity, positioning it as a key differentiator against competitors such as Sora. This physical understanding translates into a representation of the high precision movement, with fluid movements of realistic characters and objects. The result are videos with greater realism and fidelity, with fine details and a significant reduction of visual artifacts (such as extra fingers or unexpected objects) compared to previous models, using techniques such as neuronal rendering of adaptive scenes and gans. In addition, temporary consistency has been improved, maintaining the stability of characters and objects throughout the frames by latent diffusion models. Of course, as can be seen in the video, impossible images often generated as that wonderful cut of a piece of tomato that is transformed into half tomato after being cut.
Cinematographic control and styles: I see 2 interprets the "unique language of cinematography." Understand terms such as "Timelapse", "Air Take", "Drone", "Traveling", "Dolly", "Foreground", "Counterpicado", "Paneo on the right", and even allows you to specify the desired genre. It offers extensive chamber controls on taking, angles and movements, an outstanding key advantage. It can simulate specific lens effects (eg, "18mm lens" for wide angle) and effects such as "reduced field depth", including lens flashes (Lens Flare). It supports a wide range of visual and cinematographic styles.
Editing capabilities (Preview/Allowlist): I see 2 introduces more sophisticated editing functions, although they currently require access by list of allowed to vertex AI. These include masked edition or inpainting, to eliminate unwanted elements (logos, distractions) in defined video areas, and outpainting, to extend the framing of the video filling the new areas generatively, useful for changing appearance relationships. Interpolation is also mentioned to create soft transitions between fixed images and general editing capabilities to refine or review content without starting from scratch.

Google's strong emphasis on the understanding of physics and movement by I see 2 is not accidental. It seems to be a central architectural focus, aimed at correcting an important weakness observed in previous models and competitors such as Sora (evidenced by the example of the cutting of the tomato). By positioning realism as the main value proposition, Google points directly to professional cases (film preview, advertising, training) where the anti -Natural movement breaks immersion and credibility. This focus strategically differentiates I see 2 in the market, attracting users who prioritize fidelity over, perhaps, pure speed or more abstract creative freedom.

However, there is a notable gap between the announced potential and the reality accessible to many users. The difference between the promoted capacity of generating 4K videos of several minutes and the real experience of obtaining 720p clips and 8 seconds creates a marketing challenge and can generate disappointment. It suggests that, although the central model is powerful, climbing and optimizing it for broad and affordable access remains a considerable technical obstacle, probably due to high computational costs, inference times or possible problems of consistency and safety in longer durations. This discrepancy affects the user's perception: they see amazing demonstrations but interact with a less capable tool, which could harm the reputation of the product despite its underlying potential.

Finally, the emphasis on specific cinematographic controls (lenses, types of plane, field depth) is clearly oriented to professional filmmakers and creators. This approach is aligned with the highest pricing model of the API and business collaborations, suggesting an initial objective of breaking into professional workflows. Google seems to identify a main market in the creation of professional content (advertising, film preview, marketing) where these controls offer a significant value that justifies the cost, beyond simple entertainment for the consumer.

From I see 1 to I see 2

To fully understand the advances of see 2, it is useful to establish first the baseline of its predecessor. I see 1 already offered notable capabilities: video generation up to 1080p, duration of more than 60 seconds, understanding of cinematographic terms, video generation to video, application of editing commands, improvements in consistency by latent diffusion, and the implementation of Synthid water brands and safety filters.

I see 2 represents a significant evolution on this basis, with key improvements in several areas:

Resolution: The most obvious jump is the resolution objective of see 2, which reaches up to 4K, exceeding the maximum of 1080p of see 1.
Realism and fidelity: I see 2 introduces "significant improvements" in detail, realism and reduction of artifacts compared to previous and competitors models. It produces less visual "hallucinations, although as you can check in the video of this news it is not always.
Movement and Physics: It has "advanced movement capabilities" and a better simulation of real world physics, going beyond the focus on the consistency of I see 1.
Chamber Control: It offers "older" and more precise camera control options, expanding the understanding of cinematographic terms that I already owned I see 1.
Video duration: The potential of duration extends, exceeding the minute offered I see 1.
Edition: introduce more sophisticated editing capabilities such as inpainting and outpainting (in preview), which go beyond the edition commands described to see 1.

The following table directly compares the key abilities of I see 1 and I see 2:

Table 2: Comparison of features I see 1 vs. I see 2

Feature	Ability I see 1	Ability I see 2
Maximum resolution	1080p	Up to 4k (potential)
Maximum duration (potential)	> 60 seconds	Up to 2 minutes or more
Physics / Movement	Focus on consistency	Advanced physical simulation, realistic movement
Realism / fidelity	High quality	Significant, less artifact improvements
Cinematographic control	Understanding of terms	Greater precision and options (lenses, etc.)
Editing functions	Basic editing commands	INPAINTING, OUTPAINTING (Preview)

This progression of I see 1 to I see 2 illustrates an iterative improvement strategy by Google. Advances in resolution, realism, physics and control are not random; They focus on fundamental aspects of quality and video control that are crucial for professional adoption. This pattern suggests a structured development process, demonstrating a long -term commitment to refine underlying technology.

Limitations and challenges of see 2

Despite its impressive capabilities, I see 2 is not exempt from limitations and challenges, both inherent to current video generation technology by AI and specific to its implementation and deployment.

Complexity and adherence of the PROMPT: Although the understanding of natural language has improved markedly, I see 2 still has difficulties with extremely complex or detailed prompts, failing to follow all instructions with precision. Prompts engineering is still crucial to obtain good results. While Benchmarks indicate high adhesion scores to the prompt, there are cases where the model does not meet expectations.
Artifacts and consistency: the generation of visual artifacts, although reduced, has not been completely eliminated. Occasional deformities may appear in subjects, illegible text, or "hallucinations" such as extra fingers or unexpected objects. Temporary consistency may fail in very complex scenes or with rapid movements, and physical simulation can be broken in particularly complex scenarios. Some examples generated by users have been described as "unnatural" or "disturbing."
Generation speed: The time needed to generate a video can be considerable. There are comparisons that cite about 10 minutes per clip, which contrasts with the approximately 5 minutes attributed to Sora. However, some integrations, such as YouTube Shorts, seem to operate much faster. The latency of the API is officially described as "typically in a few minutes, but it may take longer."
Edition tools: The lack of editing tools integrated in some of the access interfaces (API, possibly the initial version of Gemini Advanced) forces users to resort to external software to make modifications. The most advanced editing functions in VERTEX AI require access by list of allowed users. Sora, on the other hand, includes integrated editing tools.
Available controls: some of the first users of VIs, noticed that the version of VI 2 they tested lacked controls for the resolution or duration of the video compared to Sora. However, the API/VERTEX AI does offer parameters to control the duration, appearance ratio, negative prompts and generation seed.

Access and cost: As we have detailed, fragmented access, waiting lists, geographical restrictions and high costs of the API represent significant barriers for adoption. At the moment the fees at free levels are extremely low, although being so recent their landing will still have to wait a while to evaluate it.
Content restrictions and safety filters: Safety filters implemented by Google are strict and can block the generation of content unexpectedly, even for apparently harmless prompts. There are specific restrictions for the generation of people, especially minor (controlled by parameters such as Allow_adult or Disallow in the API). Users have reported problems to generate videos even from images containing people, or in scenes without them. This excessive censorship can make the tool unusable for certain use cases.
Capacity deficiencies: Accessible versions currently lack sound generation. The difficulty in generating realistic hands is still a common problem in all AI models.

These limitations show an inherent commitment between capacity and usability. Although I see 2 presumes high -end capabilities (4K potential, realistic physics), speed restrictions, accessible controls (in some versions), the lack of integrated editing and strict content filters significantly impact practical usability. Compared to competitors that could be faster, more integrated or less restrictive (such as Sora or Runway), seeing 2 users could obtain a higher potential quality at the cost of a more cumbersome or limited user experience. This can affect adoption, especially for iterative or sensitive workflows.

In addition, reports on excessively aggressive content filters that block harmless prompts suggest possible overreaction in the prioritization of safety and risk mitigation for the brand by Google. This caution could be derived from past controversies with other AI models (such as Gemini images). While safety is essential, too strict filters can use the tool for many common use cases (for example, encourage family photos), creating an important limitation driven by risk aversion.

Finally, the combination of capacity gaps (720p/8s vs 4k/minutes), usability problems (speed, variable controls) and access barriers amplify the problem of "demonstration vs. reality". The average user experience can be far from the polished demonstrations presented by Google, which could damage credibility if expectations are not carefully managed. This significant gap between the promise and reality experienced by the user can lead to disappointment and a negative perception, despite the technological achievement that supposes I see 2.

I see 2 vs Sora and others

The position of see 2 in the market is largely defined by its comparison with its main rival, Sora de Openai, as well as Runway.

Direct comparisons (I see 2 vs. Sora):

QUALITY/REALISM: Numerous initial sources and users cite to I see 2 as superior in terms of realism, physical simulation and visual detail. Sora, on the other hand, sometimes shows difficulties with fine details (such as hands) and physics. Some analyzes suggest that Sora could be more "artistic" or creatively flexible.
Resolution: I see 2 has a potential of up to 4K, while Sora is limited to 1080p.
Duration: The potential of see 2 (more than 1-2 minutes) exceeds the duration cited for Sora (20 or 60 seconds). However, current access to I see 2 is usually shorter (8 seconds).
Speed: I see 2 (approx. 10 min) is generally slower than Sora (approx. 5 min). It is important to notice the existence of "Sora Turbo", a possibly faster and more economical version, but potentially of lower quality than Sora's original demos.
Control: I see 2 is praised by its cinematographic controls, while Sora stands out for its flexibility and functions such as storyboarding. However, MKBHD found that his View 2 test version had fewer controls than Sora.
Edition: I see 2 lacks integrated editing (except in VERTEX AI with Allowlist); Sora offers built -in tools (Remix, Loop, Blend).
Access/Price: Access to I see 2 is fragmented and the cost of the API is high; Sora is accessible through cheaper subscriptions. Currently, Sora is more accessible to the general public.

Benchmarking and other competitors:

The results of Benchmark Moviegenbench, where human evaluators described videos generated from more than 1000 PROMPTS, showed that I see 2 surpassed Sora Turbo, Kling and Moviegen both in general preference and in adherence to the prompt (evaluated at 720p with variable durations). However, it is crucial to recognize the limitations of these benchmarks, which can use selected results ("Cherry-Picked") or based on specific data sets.

The competitive panorama also includes Runway (with Gen-3 Alpha/Gen-4), Kling, Aws Nova Reel, Hailuo, Minimax and potentially Moviegen goal. Some users even express preference for Runway or Hailuo about the current version of Sora to which they have access.

The following table offers a comparative snapshot of VER 2 in front of its main competitors:

Table 3: Comparative snapshot of video generators by AI

Feature	Google I see 2	OpenAI Sora	Runway (Gen-3/4)
Main strength	Realism, physics, kinematic control [multiple]	Speed, creative flexibility, edition	Fine control, specific modes (implicit)
Max.	4K (potential)	1080p	Variable (720p-1080p+ according to plan/version)
Max.	2 min+ (potential)	20s / 60s	~ 15s (Gen-2), longer in Gen-3/4 (variable)
Speed	Slower (~ 10 min)	Faster (~ 5 min)	Fast (gen-4 real time?)
Edition tools	Limited / external (API)	Integrated (remix, loop, etc.)	Integrated (implicit)
Access model	Fragmented (API, subs, labs) [multiple]	Chatgpt subscription	Subscription / credits
Price model	API: $/sec; Sub: $ 20/month	Sub: $ 20/$ 200 month	Annual plans ($ 144- $ 1500)

This comparison suggests a possible market segmentation based on the strengths of each tool. I see 2 seems to go to the professional use of high fidelity that values cinematographic quality and physical precision [many snippets]. Sora could attract a broader audience of content creators for social networks and creative experimentation, thanks to their speed, flexibility and integrated editing. Runway, with his iterative approach and possibly specific characteristics, could find his niche between visual artists and VFX professionals. The market does not seem monolithic; Different tools are likely to coexist, serving different segments according to their central abilities.

It is crucial to apply the "released version" warning when evaluating these comparisons. Often, the public version of a model is contrasted (such as "Sora Turbo", which according to some users is lower than the initial demos) with carefully selected demos or limited access versions of another (I see 2). This makes it difficult to establish definitive judgments. The "best" model can depend largely on which specific version is being evaluated and under what conditions, making superiority a mobile goal.

Finally, there is a recurring hypothesis about the advantage of Google data. Several sources speculate that Google's direct and massive access to YouTube data gives you a significant advantage in the training of VI 2 to achieve realistic movements and understand various scenarios, compared to competitors that may need to resort to data scraping. While it is not officially confirmed, this access to a set of video data so vastly and potentially labeling could be a long -term crucial competitive pit, potentially explaining the perceived advantage of see 2 in realism and being difficult to replicate legally and effectively by others.

Security and ethics in see 2

Google has emphasized its commitment to the principles of the responsible in the development and deployment of I see 2. The company claims to have carried out extensive tests of "Red Teaming" and evaluations to prevent the generation of content that violates its policies. Two main technical mechanisms support this approach:

SYNTHID WATER BRAND: This technology is a key security feature implemented in IVO 2 and other Google generative models. It is an invisible digital water brand, directly embedded in the video photograms pixels during the generation. It is designed to be persistent even if the video is edited (cuts, filters, compress) and does not affect the perceptible visual quality. Its purpose is to allow the identification of the content as generated by AI through specialized detection tools, thus helping to combat misinformation and erroneous attribution.
Security filters: I see 2 incorporates filters designed to prevent the creation of harmful content. The API includes specific parameters to control the generation of people, such as Allow_adult (allow only adults, default value) or Disallow (not allowing people). However, as mentioned above, there are reports of users who indicate that these filters can be excessively restrictive.

Beyond these technical measures, the deployment of I see 2 is part of a broader ethical panorama with several key concerns:

Deepfakes and misinformation: the ability to generate realistic videos entails the inherent risk of creating convincing deepfakes to disseminate false information or perform malicious supplacements. Synthid is Google's main technical defense against this risk.
Intellectual property and copyright: the property of the content generated by AI remains a legally gray area. In addition, concerns arise about the data used to train these models, such as the possible use of YouTube videos without explicit consent for this purpose.
Biases: As with any model of the trained with large data sets, there is a risk that I see 2 perpetuate or amplify existing social biases in its results, although Google claims to take measures to mitigate it.
Labor displacement: The growing capacity of these tools generates concern about their impact on creative industries, with potential displacement of film roles, animation, marketing and design. A study cited estimates a significant impact on jobs in the US for 2026.

The prominent deployment of Synthid by Google in its generative models represents a proactive technical approach to address the risks of misinformation. Embarming the water mark during generation is an integrated preventive measure, unlike post-hoc detection. This suggests that Google considers watermarking as fundamental for a responsible deployment. However, the success of this strategy depends on the real robustness of water brands and the generalized adoption of reliable detection tools. It is a technical solution for a complex socio-technical problem.

The tension between implementing robust security filters and maintaining the usefulness for the user, evidenced by complaints, underlines a fundamental dilemma for AI developers: Security vs. utility. Excessively strict filters can use a tool, while lax filters increase risks. Finding the right balance is a continuous challenge, with significant implications for the adoption of the user and social impact. Google's current calibration seems to lean towards caution, which could affect its competitiveness if users find the tool too restrictive to their needs.

Finally, characteristics such as Synthid and configurable security parameters (although imperfect) represent Google's attempt to embed ethical considerations in the product's own design. This goes beyond policy statements to reach the technical implementation. While execution may have failures (too strict filters), the approach to integrate safety in the architecture of the tool reflects a specific position on the responsible development of AI, seeking to enforce ethical use through technology itself.

IMPACT AND FUTURE TRAJECTORY OF VER 2

The launch and evolution of VI 2 have significant implications that extend beyond their technical specifications, potentially affecting multiple industries and redefining creative processes.

Impact on creative industries:

I see 2 has the potential to revolutionize workflows in several sectors:

Cinema: It can expedite the preview and the testing of concepts, generate background assets, and even produce complete short films. The collaboration with filmmakers such as Donald Glover and his study Gilga underlines this approach.
Marketing and advertising: It allows rapid prototyment of ads, the generation of custom advertising content at scale, and the creation of product demonstrations. Companies such as Mondelez, WPP, Agoda, Alphawave and Trakto are already exploring it. The drastic reduction of production times (from weeks to hours, according to the Kraft Heinz company) and the lower dependence on stock footage.
Videogames: It can be used to generate cinematics or realistic promotional material.
Education and training: facilitates the creation of illustrative videos to explain complex concepts or simulate procedures (eg, medical training).
Social networks: Integration with YouTube Shorts and the ability to generate short and attractive clips make it a powerful tool for content creators on platforms such as Tiktok.

Democratization vs. Disruption:

I see 2 embodies a duality: on the one hand, it democratizes the production of high quality video, making it accessible for small companies and individual creators who previously lacked the necessary resources or technical skills. On the other hand, threatens to disruption traditional roles in creative industries and feeds concerns about the proliferation of low quality content or "AI Slop" automatically generated.

Future Development:

The users hope I see 2 end up including many improvements in subsequent versions such as:

Capacity expansion: Continuous quality improvement, broader deployment of 4K capacities and longer, and possibly the addition of sound generation.
Ecosystem integration: greater integration with other Google products such as VerTex AI, YouTube, and potentially the gemini search and ecosystem. The combination with Gemini is contemplated to improve the understanding of the physical world.
Fast evolution: The development rate will remain accelerated, promoted by intense competition in the field, with expected developments in the coming years.

The analysis suggests that tools such as I see 2 do not eliminate creative work, but move the bottleneck. The main difficulty no longer resides in technical execution (filming, edition, visual effects), but on ideation, Prompts engineering and the edition of the content generated. Success will depend more and more on the creative vision and the ability to communicate effectively with AI. The creative direction and the ability to formulate precise and evocative prompts become critical skills.

Instead of a complete replacement, the most probable short -term impact is the emergence of professional roles "increased by AI". Professionals in cinema, marketing, design, etc., will use tools such as I see 2 to improve their productivity, accelerate iteration and explore new creative possibilities. This will require adaptation and the development of new skills focused on the effective use of these tools, transforming existing roles instead of eliminating them completely in many cases.

Finally, the integration of VER 2 in the Google ecosystem (Gemini, VerTex AI, YouTube, Labs) is a clear strategic play. It seeks to create synergies (use Gemini to generate PROMPTS, image for I2V inputs, YouTube data for training) and promote user permanence within their platforms. This holistic approach could provide a competitive advantage over independent tools, making Google's offer more attractive than the simple sum of its parts for users already adapted to their ecosystem.

Videos generated by View 2

Here we leave you several videos generated by I see 2. As you will see I see 2 tends to generate impossible elements, at the bottom we indicate the PromT used.

Video of a parakeet by hitting a glass of a window with the beak, generated by I see 2