00:00

QUESTION 6

A data engineering team has two tables. The first table march_transactions is a collection of all retail transactions in the month of March. The second table april_transactions is a collection of all retail transactions in the month of April. There are no duplicate records between the tables.
Which of the following commands should be run to create a new table all_transactions that contains all records from march_transactions and april_transactions without duplicate records?

Correct Answer: B
To create a new table all_transactions that contains all records from march_transactions and april_transactions without duplicate records, you should use the UNION operator, as shown in option B. This operator combines the result sets of the two tables while automatically removing duplicate records.

QUESTION 7

A data engineer is using the following code block as part of a batch ingestion pipeline to read from a composable table:
Databricks-Certified-Data-Engineer-Associate dumps exhibit
Which of the following changes needs to be made so this code block will work when the transactions table is a stream source?

Correct Answer: E
https://docs.databricks.com/en/structured-streaming/delta-lake.html

QUESTION 8

A data engineer needs to apply custom logic to string column city in table stores for a specific use case. In order to apply this custom logic at scale, the data engineer wants to create a SQL user-defined function (UDF).
Which of the following code blocks creates this SQL UDF?
A.
Databricks-Certified-Data-Engineer-Associate dumps exhibit
B.
Databricks-Certified-Data-Engineer-Associate dumps exhibit
C.
Databricks-Certified-Data-Engineer-Associate dumps exhibit
D.
Databricks-Certified-Data-Engineer-Associate dumps exhibit
E.
Databricks-Certified-Data-Engineer-Associate dumps exhibit

Correct Answer: A
https://www.databricks.com/blog/2021/10/20/introducing-sql-user-defined- functions.html

QUESTION 9

A data engineer has developed a data pipeline to ingest data from a JSON source using Auto Loader, but the engineer has not provided any type inference or schema hints in their pipeline. Upon reviewing the data, the data engineer has noticed that all of the columns in the target table are of the string type despite some of the fields only including float or boolean values.
Which of the following describes why Auto Loader inferred all of the columns to be of the
string type?

Correct Answer: B
JSON data is a text-based format that uses strings to represent all values. When Auto Loader infers the schema of JSON data, it assumes that all values are strings. This is because Auto Loader cannot determine the type of a value based on its string representation. https://docs.databricks.com/en/ingestion/auto-loader/schema.html Forexample, the following JSON string represents a value that is logically a boolean: JSON "true" Use code with caution. Learn more However, Auto Loader would infer that the type of this value is string. This is because Auto Loader cannot determine that the value is a boolean based on its string representation. In order to get Auto Loader to infer the correct types for columns, the data engineer can provide type inference or schema hints. Type inference hints can be used to specify the types of specific columns. Schema hints can be used to provide the entire schema of the data. Therefore, the correct answer is B. JSON data is a text-based format.

QUESTION 10

A data engineer needs to create a table in Databricks using data from their organization’s existing SQLite database.
They run the following command:
Databricks-Certified-Data-Engineer-Associate dumps exhibit
Which of the following lines of code fills in the above blank to successfully complete the task?

Correct Answer: A
CREATE TABLE new_employees_table USING JDBC
OPTIONS (
url "<jdbc_url>",
dbtable "<table_name>", user '<username>', password '<password>'
) AS
SELECT * FROM employees_table_vw https://docs.databricks.com/external-data/jdbc.html#language-sql