Industry-leading cloud big data platform
38.8%
43.5%
15.3%
1.2%
1.2%
Easy Cluster Setup, Scalability, Cost-Effective for Transient Use, Wide Range of Application Support
Cluster Management Challenges, Complex Configuration, Slow Cluster Startup Time, Limited Real-Time Elasticity
Amazon EMR is a popular choice for running big data applications on AWS, praised by users for its ease of use, scalability, and integration with other AWS services. Users appreciate the simple setup and configuration, the ability to run various applications like Apache Spark, Flink, and Trino, and the cost-effective spot instances. However, some users find the user interface lacking features like auto-completion, and experience difficulties with debugging incidents and working with spot instances during unavailability. While EMR excels at batch processing large datasets, users report challenges with real-time scaling and note that the service can be expensive.
AI-Generated from the text of User Reviews
Control of specifying the configuration, and the debugging support
Troubleshooting incidents is hard with support
Spark applications, and drives all data pipelines
My workloads run faster and I have more time to work on refining the code, instead of just sitting down waiting for the query to run
It's not as elastic as promoted. I would like the cluster to scale in real time
Run big data analytics faster
User customisable plans and cost efficiency, and best UI ofcourse.
Cloud processes are still slow as it's virtual, but still it's a better choice
I have many data science tasks like calculating statistics and various mathematical functions and then implementing solution provided by business. one of the best service to work on bigdata.
Easy to setup.
Cost friendly.
Faster processing.
Quick management.
Monitoring as an add up.
Higly scalable.
Can be integrated with lots of technologies.
More secure.
For setting up Amazon EMR Clusters its reauires a good AWS skilled labours and also it is a bit difficult for a new bie to attach and find applicable permissions and policies.
Nice tool for bigdata works
It provides a good Graphical user interface for managing and working with big data's map reduce jobs rather than the manual setup with hadoop or cli.it saves a lot of time & efforts.
EMR can execute the code using spark or other clusters like Hadoop, and these increase the performance and decrease the total job execution time.
Execution time comes down to few minutes as against several hours running on either EC2 or other computing servers.
Easy to choose between hadoop or spark based EMR culsters , it can be used in conjection with other AWS services like we can build an orcestation involving EMR and several other task on AWS datapipeline service.
It takes time to spin up an EMR cluster, sometimes up to fifteen to thirty minutes if using it for the first time for a task and this happens mostly whenever starting a new cluster.
Once triggered adding new task is easy but initial setup takes time.
And we have to think on the use case and code for which EMR is to be used for at time EC2 is able to finish or perform the same processing in the comparable amount of time so in those cases we might end up increasing the overall cost of the project.
We are running it to perform processing which takes several hours on EC2 to be running on spark-based EMR cluster to complete the processing within minutes instead of several hours.
Ease of use and ability to choose from either Hadoop or spark.
Processing time decreases from 6-9 hours to 30-40 minutes compared with the Ec2 instance and more in some cases.
EMR used to develop big data solutions. I developed many big data solution using AWS EMR
Need more documentation on the Amazon site to know about EMR
Need more training on Amazon EMR service
I solved the big data lake problem using EMR and apache spark.
The auto scaling option to evaluate, transform and load the data without need to set up new settings.
The hard understanding of the user interface.
ETL and process logs
the ability to spin up cluster size based upon the needs of each job
error logging is sometimes hard to track down root cause
start with small jobs, take adavantage of new freatures in newer releases
we are building data assets for multiple business units to share via S3 and Redshift
SImplicity of emr in usability and scalability. Seamless access to s3
Can be expensive to start with. Need to have some understanding of other aws features to start using it.
Data analytics
Looking for the right SaaS
We can help you choose the best SaaS for your specific requirements. Our in-house experts will assist you with their hand-picked recommendations.
Want more customers?
Our experts will research about your product and list it on SaaSworthy for FREE.
It is very easy to launch or clone EMR cluster. And EMR provides very easy scaling capabilities based on containers, cpu , spot instances, usage of insance fleet or instance groups . And EMR supports many of the widely used applications like Spark, Hive, Hadoop, Trino, Presto, Ranger , Flink etc
Working with Spot instances on EMR is slightly complicated during unavailability of spot instances when you need to use instances on once particular availability zone. Many solutions like databricks provide fallback which are even more easy to use
Amazon EMR is helping us create the dataplatform , run 3000-6000 data pipeline jobs daily and it is also helping us consume the data stored in the datalake for visualizations , applications