
VALID AWS-Certified-Data-Analytics-Specialty Exam Dumps For Certification Exam Preparation
AWS-Certified-Data-Analytics-Specialty Dumps PDF 2024 Strategy Your Preparation Efficiently
NEW QUESTION # 111
A company is planning to do a proof of concept for a machine learning (ML) project using Amazon SageMaker with a subset of existing on-premises data hosted in the company's 3 TB data warehouse. For part of the project, AWS Direct Connect is established and tested. To prepare the data for ML, data analysts are performing data curation. The data analysts want to perform multiple step, including mapping, dropping null fields, resolving choice, and splitting fields. The company needs the fastest solution to curate the data for this project.
Which solution meets these requirements?
- A. Ingest data into Amazon S3 using AWS DataSync and use Apache Spark scrips to curate the data in an Amazon EMR cluster. Store the curated data in Amazon S3 for ML processing.
- B. Take a full backup of the data store and ship the backup files using AWS Snowball. Upload Snowball data into Amazon S3 and schedule data curation jobs using AWS Batch to prepare the data for ML.
- C. Create custom ETL jobs on-premises to curate the data. Use AWS DMS to ingest data into Amazon S3 for ML processing.
- D. Ingest data into Amazon S3 using AWS DMS. Use AWS Glue to perform data curation and store the data in Amazon S3 for ML processing.
Answer: D
NEW QUESTION # 112
A company wants to optimize the cost of its data and analytics platform. The company is ingesting a number of .csv and JSON files in Amazon S3 from various data sources. Incoming data is expected to be 50 GB each day. The company is using Amazon Athena to query the raw data in Amazon S3 directly. Most queries aggregate data from the past 12 months, and data that is older than 5 years is infrequently queried. The typical query scans about 500 MB of data and is expected to return results in less than 1 minute. The raw data must be retained indefinitely for compliance requirements.
Which solution meets the company's requirements?
- A. Use an AWS Glue ETL job to compress, partition, and convert the data into a columnar data format. Use Athena to query the processed dataset. Configure a lifecycle policy to move the processed data into the Amazon S3 Standard-Infrequent Access (S3 Standard-IA) storage class 5 years after the object was last accessed. Configure a second lifecycle policy to move the raw data into Amazon S3 Glacier for long-term archival 7 days after the last date the object was accessed.
- B. Use an AWS Glue ETL job to partition and convert the data into a row-based data format. Use Athena to query the processed dataset. Configure a lifecycle policy to move the data into the Amazon S3 Standard- Infrequent Access (S3 Standard-IA) storage class 5 years after the object was last accessed.
Configure a second lifecycle policy to move the raw data into Amazon S3 Glacier for long-term archival 7 days after the last date the object was accessed. - C. Use an AWS Glue ETL job to compress, partition, and convert the data into a columnar data format. Use Athena to query the processed dataset. Configure a lifecycle policy to move the processed data into the Amazon S3 Standard-Infrequent Access (S3 Standard-IA) storage class 5 years after object creation.
Configure a second lifecycle policy to move the raw data into Amazon S3 Glacier for long-term archival 7 days after object creation. - D. Use an AWS Glue ETL job to partition and convert the data into a row-based data format. Use Athena to query the processed dataset. Configure a lifecycle policy to move the data into the Amazon S3 Standard- Infrequent Access (S3 Standard-IA) storage class 5 years after object creation. Configure a second lifecycle policy to move the raw data into Amazon S3 Glacier for long-term archival 7 days after object creation.
Answer: A
NEW QUESTION # 113
A company uses Amazon Elasticsearch Service (Amazon ES) to store and analyze its website clickstream data. The company ingests 1 TB of data daily using Amazon Kinesis Data Firehose and stores one day's worth of data in an Amazon ES cluster.
The company has very slow query performance on the Amazon ES index and occasionally sees errors from Kinesis Data Firehose when attempting to write to the index. The Amazon ES cluster has 10 nodes running a single index and 3 dedicated master nodes. Each data node has 1.5 TB of Amazon EBS storage attached and the cluster is configured with 1,000 shards. Occasionally, JVMMemoryPressure errors are found in the cluster logs.
Which solution will improve the performance of Amazon ES?
- A. Decrease the number of Amazon ES shards for the index.
- B. Increase the number of Amazon ES shards for the index.
- C. Decrease the number of Amazon ES data nodes.
- D. Increase the memory of the Amazon ES master nodes.
Answer: A
Explanation:
Explanation
https://aws.amazon.com/premiumsupport/knowledge-center/high-jvm-memory-pressure-elasticsearch/
NEW QUESTION # 114
A company wants to improve the data load time of a sales data dashboard. Data has been collected as .csv files and stored within an Amazon S3 bucket that is partitioned by date. The data is then loaded to an Amazon Redshift data warehouse for frequent analysis. The data volume is up to 500 GB per day.
Which solution will improve the data loading performance?
- A. Compress .csv files and use an INSERT statement to ingest data into Amazon Redshift.
- B. Split large .csv files, then use a COPY command to load data into Amazon Redshift.
- C. Use Amazon Kinesis Data Firehose to ingest data into Amazon Redshift.
- D. Load the .csv files in an unsorted key order and vacuum the table in Amazon Redshift.
Answer: B
Explanation:
https://docs.aws.amazon.com/redshift/latest/dg/c_loading-data-best-practices.html
NEW QUESTION # 115
A bank is building an Amazon S3 data lake. The bank wants a single data repository for customer data needs, such as personalized recommendations. The bank needs to use Amazon Kinesis Data Firehose to ingest customers' personal information, bank accounts, and transactions in near real time from a transactional relational database.
All personally identifiable information (Pll) that is stored in the S3 bucket must be masked. The bank has enabled versioning for the S3 bucket.
Which solution will meet these requirements?
- A. Create an AWS Lambda function to read the objects, mask the Pll, and store the objects back with same key. Invoke the Lambda function from S3 events.
- B. Configure server-side encryption (SSE) for the S3 bucket. Invoke an AWS Lambda function from S3 events to mask the PII.
- C. Invoke an AWS Lambda function from Kinesis Data Firehose to mask the PII before Kinesis Data Firehose delivers the data to the S3 bucket.
- D. Use Amazon Macie to scan the S3 bucket. Configure Macie to discover Pll. Invoke an AWS Lambda function from S3 events to mask the Pll.
Answer: C
NEW QUESTION # 116
A company needs to store objects containing log data in JSON format. The objects are generated by eight applications running in AWS. Six of the applications generate a total of 500 KiB of data per second, and two of the applications can generate up to 2 MiB of data per second.
A data engineer wants to implement a scalable solution to capture and store usage data in an Amazon S3 bucket. The usage data objects need to be reformatted, converted to .csv format, and then compressed before they are stored in Amazon S3. The company requires the solution to include the least custom code possible and has authorized the data engineer to request a service quota increase if needed.
Which solution meets these requirements?
- A. Store usage data objects in an Amazon DynamoDB table. Configure a DynamoDB stream to copy the objects to an S3 bucket. Configure an AWS Lambda function to be triggered when objects are written to the S3 bucket. Have the function convert the objects into .csv format.
- B. Configure an Amazon Kinesis Data Firehose delivery stream for each application. Write AWS Lambda functions to read log data objects from the stream for each application. Have the function perform reformatting and .csv conversion. Enable compression on all the delivery streams.
- C. Configure an Amazon Kinesis data stream for each application. Write an AWS Lambda function to read usage data objects from the stream for each application. Have the function perform .csv conversion, reformatting, and compression of the data. Have the function store the output in Amazon S3.
- D. Configure an Amazon Kinesis data stream with one shard per application. Write an AWS Lambda function to read usage data objects from the shards. Have the function perform .csv conversion, reformatting, and compression of the data. Have the function store the output in Amazon S3.
Answer: B
NEW QUESTION # 117
A company wants to provide its data analysts with uninterrupted access to the data in its Amazon Redshift cluster. All data is streamed to an Amazon S3 bucket with Amazon Kinesis Data Firehose. An AWS Glue job that is scheduled to run every 5 minutes issues a COPY command to move the data into Amazon Redshift.
The amount of data delivered is uneven throughout the day, and cluster utilization is high during certain periods. The COPY command usually completes within a couple of seconds. However, when load spike occurs, locks can exist and data can be missed. Currently, the AWS Glue job is configured to run without retries, with timeout at 5 minutes and concurrency at 1.
How should a data analytics specialist configure the AWS Glue job to optimize fault tolerance and improve data availability in the Amazon Redshift cluster?
- A. Keep the number of retries at 0. Decrease the timeout value. Increase the job concurrency.
- B. Keep the number of retries at 0. Increase the timeout value. Keep the job concurrency at 1.
- C. Increase the number of retries. Decrease the timeout value. Increase the job concurrency.
- D. Keep the number of retries at 0. Decrease the timeout value. Keep the job concurrency at 1.
Answer: A
NEW QUESTION # 118
An online retail company with millions of users around the globe wants to improve its ecommerce analytics capabilities. Currently, clickstream data is uploaded directly to Amazon S3 as compressed files. Several times each day, an application running on Amazon EC2 processes the data and makes search options and reports available for visualization by editors and marketers. The company wants to make website clicks and aggregated data available to editors and marketers in minutes to enable them to connect with users more effectively.
Which options will help meet these requirements in the MOST efficient way? (Choose two.)
- A. Upload clickstream records to Amazon S3 as compressed files. Then use AWS Lambda to send data to Amazon Elasticsearch Service from Amazon S3.
- B. Use Kibana to aggregate, filter, and visualize the data stored in Amazon Elasticsearch Service. Refresh content performance dashboards in near-real time.
- C. Upload clickstream records from Amazon S3 to Amazon Kinesis Data Streams and use a Kinesis Data Streams consumer to send records to Amazon Elasticsearch Service.
- D. Use Amazon Kinesis Data Firehose to upload compressed and batched clickstream records to Amazon Elasticsearch Service.
- E. Use Amazon Elasticsearch Service deployed on Amazon EC2 to aggregate, filter, and process the data.
Refresh content performance dashboards in near-real time.
Answer: B,D
NEW QUESTION # 119
A retail company's data analytics team recently created multiple product sales analysis dashboards for the average selling price per product using Amazon QuickSight. The dashboards were created from .csv files uploaded to Amazon S3. The team is now planning to share the dashboards with the respective external product owners by creating individual users in Amazon QuickSight. For compliance and governance reasons, restricting access is a key requirement. The product owners should view only their respective product analysis in the dashboard reports.
Which approach should the data analytics team take to allow product owners to view only their products in the dashboard?
- A. Separate the data by product and use IAM policies for authorization.
- B. Create dataset rules with row-level security.
- C. Create a manifest file with row-level security.
- D. Separate the data by product and use S3 bucket policies for authorization.
Answer: B
Explanation:
Explanation
https://docs.aws.amazon.com/quicksight/latest/user/restrict-access-to-a-data-set-using-row-level-security.html
NEW QUESTION # 120
A smart home automation company must efficiently ingest and process messages from various connected devices and sensors. The majority of these messages are comprised of a large number of small files. These messages are ingested using Amazon Kinesis Data Streams and sent to Amazon S3 using a Kinesis data stream consumer application. The Amazon S3 message data is then passed through a processing pipeline built on Amazon EMR running scheduled PySpark jobs.
The data platform team manages data processing and is concerned about the efficiency and cost of downstream data processing. They want to continue to use PySpark.
Which solution improves the efficiency of the data processing jobs and is well architected?
- A. Set up an AWS Lambda function with a Python runtime environment. Process individual Kinesis data stream messages from the connected devices and sensors using Lambda.
- B. Set up AWS Glue Python jobs to merge the small data files in Amazon S3 into larger files and transform them to Apache Parquet format. Migrate the downstream PySpark jobs from Amazon EMR to AWS Glue.
- C. Send the sensor and devices data directly to a Kinesis Data Firehose delivery stream to send the data to Amazon S3 with Apache Parquet record format conversion enabled. Use Amazon EMR running PySpark to process the data in Amazon S3.
- D. Launch an Amazon Redshift cluster. Copy the collected data from Amazon S3 to Amazon Redshift and move the data processing jobs from Amazon EMR to Amazon Redshift.
Answer: B
Explanation:
Explanation
https://aws.amazon.com/it/about-aws/whats-new/2020/04/aws-glue-now-supports-serverless-streaming-etl/
NEW QUESTION # 121
A company launched a service that produces millions of messages every day and uses Amazon Kinesis Data Streams as the streaming service.
The company uses the Kinesis SDK to write data to Kinesis Data Streams. A few months after launch, a data analyst found that write performance is significantly reduced. The data analyst investigated the metrics and determined that Kinesis is throttling the write requests. The data analyst wants to address this issue without significant changes to the architecture.
Which actions should the data analyst take to resolve this issue? (Choose two.)
- A. Choose partition keys in a way that results in a uniform record distribution across shards.
- B. Replace the Kinesis API-based data ingestion mechanism with Kinesis Agent.
- C. Increase the Kinesis Data Streams retention period to reduce throttling.
- D. Increase the number of shards in the stream using the UpdateShardCount API.
- E. Customize the application code to include retry logic to improve performance.
Answer: A,D
Explanation:
https://aws.amazon.com/blogs/big-data/under-the-hood-scaling-your-kinesis-data-streams/
NEW QUESTION # 122
A media company has been performing analytics on log data generated by its applications. There has been a recent increase in the number of concurrent analytics jobs running, and the overall performance of existing jobs is decreasing as the number of new jobs is increasing. The partitioned data is stored in Amazon S3 One Zone-Infrequent Access (S3 One Zone-IA) and the analytic processing is performed on Amazon EMR clusters using the EMR File System (EMRFS) with consistent view enabled. A data analyst has determined that it is taking longer for the EMR task nodes to list objects in Amazon S3.
Which action would MOST likely increase the performance of accessing log data in Amazon S3?
- A. Use a lifecycle policy to change the S3 storage class to S3 Standard for the log data.
- B. Increase the read capacity units (RCUs) for the shared Amazon DynamoDB table.
- C. Use a hash function to create a random string and add that to the beginning of the object prefixes when storing the log data in Amazon S3.
- D. Redeploy the EMR clusters that are running slowly to a different Availability Zone.
Answer: D
NEW QUESTION # 123
An online gaming company is using an Amazon Kinesis Data Analytics SQL application with a Kinesis data stream as its source. The source sends three non-null fields to the application: player_id, score, and us_5_digit_zip_code.
A data analyst has a .csv mapping file that maps a small number of us_5_digit_zip_code values to a territory code. The data analyst needs to include the territory code, if one exists, as an additional output of the Kinesis Data Analytics application.
How should the data analyst meet this requirement while minimizing costs?
- A. Store the contents of the mapping file in an Amazon DynamoDB table. Change the Kinesis Data Analytics application to send its output to an AWS Lambda function that fetches the mapping and supplements each record to include the territory code, if one exists. Forward the record from the Lambda function to the original application destination.
- B. Store the mapping file in an Amazon S3 bucket and configure the reference data column headers for the
.csv file in the Kinesis Data Analytics application. Change the SQL query in the application to include a join to the file's S3 Amazon Resource Name (ARN), and add the territory code field to the SELECT columns. - C. Store the mapping file in an Amazon S3 bucket and configure it as a reference data source for the Kinesis Data Analytics application. Change the SQL query in the application to include a join to the reference table and add the territory code field to the SELECT columns.
- D. Store the contents of the mapping file in an Amazon DynamoDB table. Preprocess the records as they arrive in the Kinesis Data Analytics application with an AWS Lambda function that fetches the mapping and supplements each record to include the territory code, if one exists. Change the SQL query in the application to include the new field in the SELECT statement.
Answer: C
NEW QUESTION # 124
A company has collected more than 100 TB of log files in the last 24 months. The files are stored as raw text in a dedicated Amazon S3 bucket. Each object has a key of the form year-month-day_log_HHmmss.txt where HHmmss represents the time the log file was initially created. A table was created in Amazon Athena that points to the S3 bucket. One-time queries are run against a subset of columns in the table several times an hour.
A data analyst must make changes to reduce the cost of running these queries. Management wants a solution with minimal maintenance overhead.
Which combination of steps should the data analyst take to meet these requirements? (Choose three.)
- A. Drop and recreate the table with the PARTITIONED BY clause. Run the ALTER TABLE ADD PARTITION statement.
- B. Convert the log files to Apache Parquet format.
- C. Add a key prefix of the form date=year-month-day/ to the S3 objects to partition the data.
- D. Add a key prefix of the form year-month-day/ to the S3 objects to partition the data.
- E. Convert the log files to Apace Avro format.
- F. Drop and recreate the table with the PARTITIONED BY clause. Run the MSCK REPAIR TABLE statement.
Answer: B,C,F
Explanation:
Section: (none)
Explanation
NEW QUESTION # 125
A company stores its sales and marketing data that includes personally identifiable information (PII) in Amazon S3. The company allows its analysts to launch their own Amazon EMR cluster and run analytics reports with the data. To meet compliance requirements, the company must ensure the data is not publicly accessible throughout this process. A data engineer has secured Amazon S3 but must ensure the individual EMR clusters created by the analysts are not exposed to the public internet.
Which solution should the data engineer to meet this compliance requirement with LEAST amount of effort?
- A. Check the security group of the EMR clusters regularly to ensure it does not allow inbound traffic from IPv4 0.0.0.0/0 or IPv6 ::/0.
- B. Create an EMR security configuration and ensure the security configuration is associated with the EMR clusters when they are created.
- C. Use AWS WAF to block public internet access to the EMR clusters across the board.
- D. Enable the block public access setting for Amazon EMR at the account level before any EMR cluster is created.
Answer: A
NEW QUESTION # 126
A company has developed an Apache Hive script to batch process data stared in Amazon S3. The script needs to run once every day and store the output in Amazon S3. The company tested the script, and it completes within 30 minutes on a small local three-node cluster.
Which solution is the MOST cost-effective for scheduling and executing the script?
- A. Use the AWS Management Console to spin up an Amazon EMR cluster with Python Hue. Hive, and Apache Oozie. Set the termination protection flag to true and use Spot Instances for the core nodes of the cluster. Configure an Oozie workflow in the cluster to invoke the Hive script daily.
- B. Use AWS Lambda layers and load the Hive runtime to AWS Lambda and copy the Hive script. Schedule the Lambda function to run daily by creating a workflow using AWS Step Functions.
- C. Create an AWS Glue job with the Hive script to perform the batch operation. Configure the job to run once a day using a time-based schedule.
- D. Create an AWS Lambda function to spin up an Amazon EMR cluster with a Hive execution step. Set KeepJobFlowAliveWhenNoSteps to false and disable the termination protection flag. Use Amazon CloudWatch Events to schedule the Lambda function to run daily.
Answer: C
NEW QUESTION # 127
A data analyst is using Amazon QuickSight for data visualization across multiple datasets generated by applications. Each application stores files within a separate Amazon S3 bucket. AWS Glue Data Catalog is used as a central catalog across all application data in Amazon S3. A new application stores its data within a separate S3 bucket. After updating the catalog to include the new application data source, the data analyst created a new Amazon QuickSight data source from an Amazon Athena table, but the import into SPICE failed.
How should the data analyst resolve the issue?
- A. Edit the permissions for the new S3 bucket from within the S3 console.
- B. Edit the permissions for the AWS Glue Data Catalog from within the AWS Glue console.
- C. Edit the permissions for the new S3 bucket from within the Amazon QuickSight console.
- D. Edit the permissions for the AWS Glue Data Catalog from within the Amazon QuickSight console.
Answer: C
NEW QUESTION # 128
A media company is using Amazon QuickSight dashboards to visualize its national sales dat a. The dashboard is using a dataset with these fields: ID, date, time_zone, city, state, country, longitude, latitude, sales_volume, and number_of_items.
To modify ongoing campaigns, the company wants an interactive and intuitive visualization of which states across the country recorded a significantly lower sales volume compared to the national average.
Which addition to the company's QuickSight dashboard will meet this requirement?
- A. A pivot table of sales volume data summed up at the state level.
- B. A drill-down layer for state-level sales volume data.
- C. A drill through to other dashboards containing state-level sales volume data.
- D. A geospatial color-coded chart of sales volume data across the country.
Answer: A
NEW QUESTION # 129
A company has an application that uses the Amazon Kinesis Client Library (KCL) to read records from a Kinesis data stream.
After a successful marketing campaign, the application experienced a significant increase in usage. As a result, a data analyst had to split some shards in the data stream. When the shards were split, the application started throwing an ExpiredIteratorExceptions error sporadically.
What should the data analyst do to resolve this?
- A. Increase the number of threads that process the stream records.
- B. Decrease the provisioned write capacity units assigned to the stream's Amazon DynamoDB table.
- C. Increase the provisioned read capacity units assigned to the stream's Amazon DynamoDB table.
- D. Increase the provisioned write capacity units assigned to the stream's Amazon DynamoDB table.
Answer: D
NEW QUESTION # 130
......
Latest Verified & Correct AWS-Certified-Data-Analytics-Specialty Questions: https://pass4sure.actual4cert.com/AWS-Certified-Data-Analytics-Specialty-pass4sure-vce.html