A simple yet powerful way to manage data warehouses
31.5%
40%
27.7%
0.8%
0%
Ease of use for setting up data pipelines
the customer support is little less useful for more complex issues.
Many times patches are applied on workspaces which lead to failues
Databricks is providing one single platform for all of Data Engineering , Data Science and Ops
It is leading and fast in adopting the new cutting edge tech
My comments on the Lakehouse are specific to Unity Catalog (UC):
Governance is all about being a " benevolent bad cop" to the enterprise audiences! That message , up until now(i.e advent of UC), was mostly /only possible via a 'stale Power Point' and , after the Governance teams enforce compliance standards , possibly due to an adverse event of data breach. WHat I have been able to 'show-and-tell' via live DBX UC demo's to the largest healthcare provider enterprise users has captured the rapt attention of the folks! That is my experience. Now coming to the features that UC offers - OKTA Inegration to rope in the Identities of any IAM system over to UC, APIs to setup ACCESS GRANTS & SCHEMA OBJECTS creation, Security via RLS/CLM, and above all, I feel, the cross-workspace access setup to ensure LOBs/Teams with Data Assets across several Catalogs, goes a long way to ensure seamless & ubiqutous data sharing.
The featuers allow for Power Users who are skilled in ANSI SQL to execute their querries across the three namespace architectures (catalog.schema.tables) once the cross WS access is setup. Now coming to the ML Model building Data Scientists and Citizen Data Scientist, the centralized storing of the Model Experiment with its features can be registered in Unity Catalog to ensure Centralized governance of the ensuring endpoints that enable Model Serving.
The Future release of ABACS (as opposed to RBACs) could deliver compute/cluster economies of scale/scope from a cost perspective while making Sensitive Data MAsking and Tagging at a DDL level seamless.
Another eagerly anticipated feature would be autmated sensitive data identification & tagging via the OKERA Integration of all "DBx registered Data Assets in DBx Catalogs".
The use of Service PRinciples as identities opens the scope to intelligently manage /address the limitation of the number of AD groups /Global Groups that can be created.
These are my current observations.
Not a "poke in the eye" of the hard working Solutions Enginners who face us the clients, music , but ....
1. The Product Engg teams appear to lack digesting the Governance Narratives that enterprises expect , out of the box, not wait for a product release.
2. The fact that Spark engine centric DBx compoutes/workspaces will see a heavy legacy SQL code with all its fun (hard coding, nest sub-querries, temp tables use, CTAS et al....) , the product engg teams appear to not hav such folks at " Product Desgin" phase. Ditto, moresoever, for point #1
3. The publicly available documentation pertaining to features appears to be stale when compared with the features being released.
4. The commitment to deliver a features (example ABACS) on the set date, has spanned several quarters over close to two years! When you promise solving world hunger and keep moving the goal post , credibility is impaired.
Hey, how come your smart alecs did not realize that we use Dbx for "Data Governance ". List that also!!
1. Support for ACID transactions, time travelling, versioning
2.unity catalog for access control
As databricks Lakehouse is built on top of delta lakes, it some times throws errors that are related to storage
1. Storage and retriving the data and able to perform transformation on huge amounts of data without any hiccups
Cluster creation is now made easy through a simple configuration page.
Workspace allows you to organise all your notebooks in one place.
Job mode allows to plan notebook execution and to plan dev/prod pipelines.
Data visualization of notebooks output cells is basic, even if it is good for simple application. Dashboard section could be improved by increasing clarity. These are however minor complaints.
Databricks is helping me saving time when developing code and running jobs at given datetimes.
The autocomplete tool is very efficient, specially when dealing with very long codes and installing python packages or java library is no longer a problem.
A great platform to focus on industry challenges around data and ai. Good part is solutions for those challenges are quickly built, tested and released. Participation during private preview also make sure that these solutions are fit for purpose to industry challenges.
Best features floats around combining data lake and datawarehouse capability to help reduce cost and deliver faster with improved security
Its integration with native cloud services is still weak, out of the box integration & use with org identify federation is still not mature. Along with capability to integrate with enterprise catalog and buillding a unified metric system for organization.
Unified view for all our data sources, easy sharing of data with our product team, easy platform for data owners to democratise their data. and central place to apply security and governance.
Databricks Lakehouse Platform impressively unifies data lakes and data warehouses, empowering seamless data access and analysis. Its Apache Spark-powered engine ensures lightning-fast processing, while advanced analytics and machine learning capabilities drive data-driven insights. With robust security, auto-scaling, and managed services, Databricks simplifies data management and boosts collaboration among data teams. An extensive range of integrations further enhances its versatility, making it a game-changer for data-driven organizations.
Learning curve - for users unfamiliar with Apache Spark there may be a learning curve to fully utilize platform capabilities.
Complexity - managing and optimizing large-scale data workflows can be comples, requiring skilled data engineers and administrators,
Sometimes new features are not fully tested, which may cause some problems in the future - but honestly it's not a big disadvantage.
Keeping everything as all in one product for creating ETL pipelines and data governance solutions. Also it lets you simply scale your workload if it's really needed.
Databrics offers a complete platform for managing all of our data - from ingesting events in high scale to writing tens of aggregations and models on top of the data - all in one place
Databicks is very fast-paced, and this means we always have to be on our feet learning new features and see how the platform evolves. Sometimes this can lead to changes in existing code to migrate to new features developed
Databricks gives us the ability to process large amounts of data, get all the benefits of spark, delta lake and mlflow, all under the same platform. This solves the need to manage or build such infrastructure on our own.
1. ACID compliance on Data Lake which saves not only cost for storage but makes queries faster.
2. Customizable as per budget (by use of correct cluster sizing and other ways)
3. Init Scripts is really a boon if used correctly.
1. Clusters often take up a lot of time to start up.
2. Many bugs were encountered personally on the new Unity Catalogue feature.
3. Missing Information Schema on Hive_Metastore.
The primary problem that Databricks Lakehouse Platform is solving is storing and processing big data. With its support of a wide variety of languages like Python, Scala, Sql etc it becomes mighty and helpful to process data. Role-based access management is a blessing for data governance.
We have been using Azure data bricks for over three years. The evolution of data bricks is impressive, as well as its stability. When our users (data engineer, data scientist) want a new feature, it very often happens that within a week or three months, this feature is developed by the Databricks team. Magic
As a user, if you want to keep pace with the new features and capacity of the platform, you have to stop sleeping :) And yet the documentation is abundant, and the media is of good quality.
Sentiment detection use cases, anomaly/fraud detection, risky customer detection. By associating it with cognitive services, identity document recognition use cases, kyc.
Looking for the right SaaS
We can help you choose the best SaaS for your specific requirements. Our in-house experts will assist you with their hand-picked recommendations.
Want more customers?
Our experts will research about your product and list it on SaaSworthy for FREE.
Lo mejor que tiene la plataforma es la facilidad de uso, literalmente es spark como servicio.
Lo que necesita es tener una forma mas comoda de conectarte a los notebooks via un ide.
Me evita tener que crear mis propios servicios y administrarlos.