2024 Updated Snowflake DSA-C02 Certification Study Guide Pass DSA-C02 Fast [Q16-Q34]

Share

2024 Updated Snowflake DSA-C02 Certification Study Guide Pass DSA-C02 Fast

DSA-C02 Dumps PDF 2024 Program Your Preparation EXAM SUCCESS

NEW QUESTION # 16
Consider a data frame df with columns ['A', 'B', 'C', 'D'] and rows ['r1', 'r2', 'r3']. What does the ex-pression df[lambda x : x.index.str.endswith('3')] do?

  • A. Returns the third column
  • B. Results in Error
  • C. Filters the row labelled r3
  • D. Returns the row name r3

Answer: C

Explanation:
Explanation
It will Filters the row labelled r3.


NEW QUESTION # 17
Which of the following Snowflake parameter can be used to Automatically Suspend Tasks which are running Data science pipelines after specified Failed Runs?

  • A. SUSPEND_TASK_AUTO_NUM_FAILURES
  • B. SUSPEND_TASK
  • C. SUSPEND_TASK_AFTER_NUM_FAILURES
  • D. There is none as such available.

Answer: C

Explanation:
Explanation
Automatically Suspend Tasks After Failed Runs
Optionally suspend tasks automatically after a specified number of consecutive runs that either fail or time out.
This feature can reduce costs by suspending tasks that consume Snowflake credits but fail to run to completion. Failed task runs include runs in which the SQL code in the task body either produces a user error or times out. Task runs that are skipped, canceled, or that fail due to a sys-tem error are considered indeterminate and are not included in the count of failed task runs.
Set the SUSPEND_TASK_AFTER_NUM_FAILURES = num parameter on a standalone task or the root task in a DAG. When the parameter is set to a value greater than 0, the following behavior applies to runs of the standalone task or DAG:
Standalone tasks are automatically suspended after the specified number of consecutive task runs either fail or time out.
The root task is automatically suspended after the run of any single task in a DAG fails or times out the specified number of times in consecutive runs.
The parameter can be set when creating a task (using CREATE TASK) or later (using ALTER TASK). The setting applies to tasks that rely on either Snowflake-managed compute resources (i.e. serverless compute model) or user-managed compute resources (i.e. a virtual warehouse).
The SUSPEND_TASK_AFTER_NUM_FAILURES parameter can also be set at the account, database, or schema level. The setting applies to all standalone or root tasks contained in the modified object. Note that explicitly setting the parameter at a lower (i.e. more granular) level overrides the parameter value set at a higher level.


NEW QUESTION # 18
Which of the Following is not type of Windows function in Snowflake?

  • A. Rank-related functions.
  • B. Aggregation window functions.
  • C. Association functions.
  • D. Window frame functions.

Answer: B,C

Explanation:
Explanation
Window Functions
A window function operates on a group ("window") of related rows.
Each time a window function is called, it is passed a row (the current row in the window) and the window of rows that contain the current row. The window function returns one output row for each input row. The output depends on the individual row passed to the function and the values of the other rows in the window passed to the function.
Some window functions are order-sensitive. There are two main types of order-sensitive window functions:
Rank-related functions.
Window frame functions.
Rank-related functions list information based on the "rank" of a row. For example, if you rank stores in descending order by profit per year, the store with the most profit will be ranked 1; the second-most profitable store will be ranked 2, etc.
Window frame functions allow you to perform rolling operations, such as calculating a running total or a moving average, on a subset of the rows in the window.


NEW QUESTION # 19
Which of the learning methodology applies conditional probability of all the variables with respec-tive the dependent variable?

  • A. Artificial learning
  • B. Unsupervised learning
  • C. Reinforcement learning
  • D. Supervised learning

Answer: C

Explanation:
Explanation
Supervised learning methodology applies conditional probability of all the variables with respective the dependent variable and generally conditional probability of variables is nothing but a basic method of estimating the statistics for few random experiments.
Conditional probability is thus the likelihood of an event or outcome occurring based on the occurrence of some other event or prior outcome. Two events are said tobe independent if one event occurring does not affect the probability that the other event will occur.


NEW QUESTION # 20
Mark the incorrect statement regarding Python UDF?

  • A. Python UDFs can contain both new code and calls to existing packages
  • B. For each row passed to a UDF, the UDF returns either a scalar (i.e. single) value or, if defined as a table function, a set of rows.
  • C. A scalar function (UDF) returns a tabular value for each input row
  • D. A UDF also gives you a way to encapsulate functionality so that you can call it repeatedly from multiple places in code

Answer: C

Explanation:
Explanation
A scalar function (UDF) returns one output row for each input row. The returned row consists of a single column/value


NEW QUESTION # 21
Mark the correct steps for saving the contents of a DataFrame to aSnowflake table as part of Moving Data from Spark to Snowflake?

  • A. Step 1.Use the writer() method of the DataFrame to construct a DataFrameWriter.
    Step 2.Specify SNOWFLAKE_SOURCE_NAME using the format() method.
    Step 3.Use the dbtable option to specify the table to which data is written.
    Step 4.Specify the connector options using either the option() or options() method.
    Step 5.Use the save() method to specify the save mode for the content.
  • B. Step 1.Use the PUT() method of the DataFrame to construct a DataFrameWriter.
    Step 2.Specify SNOWFLAKE_SOURCE_NAME using the NAME() method.
    Step 3.Use the dbtable option to specify the table to which data is written.
    Step 4.Specify the connector options using either the option() or options() method.
    Step 5.Use the save() method to specify the save mode for the content.
  • C. Step 1.Use the write() method of the DataFrame to construct a DataFrameWriter.
    Step 2.Specify SNOWFLAKE_SOURCE_NAME using the format() method.
    Step 3.Specify the connector options using either the option() or options() method.
    Step 4.Use the dbtable option to specify the table to which data is written.
    Step 5.Use the mode() method to specify the save mode for the content.
    (Correct)
  • D. Step 1.Use the PUT() method of the DataFrame to construct a DataFrameWriter.
    Step 2.Specify SNOWFLAKE_SOURCE_NAME using the format() method.
    Step 3.Specify the connector options using either the option() or options() method.
    Step 4.Use the dbtable option to specify the table to which data is written.
    Step 5.Use the save() method to specify the save mode for the content.

Answer: C

Explanation:
Explanation
Moving Data from Spark to Snowflake
The steps for saving the contents of a DataFrame to a Snowflake table are similar to writing from Snowflake to Spark:
1. Use the write() method of the DataFrame to construct a DataFrameWriter.
2. Specify SNOWFLAKE_SOURCE_NAME using the format() method.
3. Specify the connector options using either the option() or options() method.
4. Use the dbtable option to specify the table to which data is written.
5. Use the mode() method to specify the save mode for the content.
Examples
1.df.write
2..format(SNOWFLAKE_SOURCE_NAME)
3..options(sfOptions)
4..option("dbtable", "t2")
5..mode(SaveMode.Overwrite)
6..save()


NEW QUESTION # 22
Mark the Incorrect statements regarding MIN / MAX Functions?

  • A. NULL values are ignored unless all the records are NULL, in which case a NULL value is returned
  • B. NULL values are skipped unless all the records are NULL
  • C. For compatibility with other systems, the DISTINCT keyword can be specified as an argument for MIN or MAX, but it does not have any effect
  • D. The data type of the returned value is the same as the data type of the input values

Answer: A

Explanation:
Explanation
NULL values are ignored unless all the records are NULL, in which case a NULL value is returned


NEW QUESTION # 23
Consider a data frame df with 10 rows and index [ 'r1', 'r2', 'r3', 'row4', 'row5', 'row6', 'r7', 'r8', 'r9', 'row10'].
What does the expression g = df.groupby(df.index.str.len()) do?

  • A. Data frames cannot be grouped by index values. Hence it results in Error.
  • B. Groups df based on index strings
  • C. Groups df based on index values
  • D. Groups df based on length of each index value

Answer: A

Explanation:
Explanation
Data frames cannot be grouped by index values. Hence it results in Error.


NEW QUESTION # 24
You are training a binary classification model to support admission approval decisions for a college degree program.
How can you evaluate if the model is fair, and doesn't discriminate based on ethnicity?

  • A. Evaluate each trained model with a validation datasetand use the model with the highest accuracy score.
  • B. Remove the ethnicity feature from the training dataset.
  • C. None of the above.
  • D. Compare disparity between selection rates and performance metrics across ethnicities.

Answer: D

Explanation:
Explanation
By using ethnicity as a sensitive field, and comparing disparity between selection rates and performance metrics for each ethnicity value, you can evaluate the fairness of the model.


NEW QUESTION # 25
Mark the Incorrect understanding of Data Scientist about Streams?

  • A. Streams on views support both local views and views shared using Snowflake Secure Data Sharing, including secure views.
  • B. Streams do not support repeatable read isolation.
  • C. Streams can track changes in materialized views.
  • D. Streams itself does not contain any table data.

Answer: B,C

Explanation:
Explanation
Streams on views support both local views and views shared using Snowflake Secure Data Sharing, including secure views. Currently, streams cannot track changes in materialized views.
stream itself does not contain any table data. A stream only stores an offset for the source object and returns CDC records by leveraging the versioning history for the source object. When the first stream for a table is created, several hidden columns are added to the source table and begin storing change tracking metadata.
These columns consume a small amount of storage. The CDC records returned when querying a stream rely on a combination of the offset stored in the stream and the change tracking metadata stored in the table. Note that for streams on views, change tracking must be enabled explicitly for the view and underlying tables to add the hidden columns to these tables.
Streams support repeatable read isolation. In repeatable read mode, multiple SQL statements within a transaction see the same set of records in a stream. This differs from the read committed mode supported for tables, in which statements see any changes made by previous statements executed within the same transaction, even though those changes are not yet committed.
The delta records returned by streams in a transaction is the range from the current position of the stream until the transaction start time. The stream position advances to the transaction start time if the transaction commits; otherwise it stays at the same position.


NEW QUESTION # 26
Which of the following method is used for multiclass classification?

  • A. loocv
  • B. all vs one
  • C. one vs rest
  • D. one vs another

Answer: C

Explanation:
Explanation
Binary vs. Multi-Class Classification
Classification problems are common in machine learning. In most cases, developers prefer using a supervised machine-learning approach to predict class tables for a given dataset. Unlike regression, classification involves designing the classifier model and training it to input and categorize the test dataset. For that, you can divide the dataset into either binary or multi-class modules.
As the name suggests, binary classification involves solving a problem with only two class labels. This makes it easy to filter the data, apply classification algorithms, and train the model to predict outcomes. On the other hand, multi-class classification is applicable when there are more than two class labels in the input train data.
The technique enables developers to categorize the test data into multiple binary class labels.
That said, while binary classification requires only one classifier model, the one used in the multi-class approach depends on the classification technique. Below are the two models of the multi-class classification algorithm.
One-Vs-Rest Classification Model for Multi-Class Classification
Also known as one-vs-all, the one-vs-rest model is a defined heuristic method that leverages a binary classification algorithm for multi-class classifications. The technique involves splitting a multi-class dataset into multiple sets of binary problems. Following this, a binary classifier is trained to handle each binary classification model with the most confident one making predictions.
For instance, with a multi-class classification problem with red, green, and blue datasets, binary classification can be categorized as follows:
Problem one: red vs. green/blue
Problem two: blue vs. green/red
Problem three: green vs. blue/red
The only challenge of using this model is that you should create a model for every class. The three classes require three models from the above datasets, which can be challenging for large sets of data with million rows, slow models, such as neural networks and datasets with a significant number of classes.
The one-vs-rest approach requires individual models to prognosticate the probability-like score. The class index with the largest score is then used to predict a class. As such, it is commonly used forclassification algorithms that can naturally predict scores or numerical class membership such as perceptron and logistic regression.


NEW QUESTION # 27
Which ones are the type of visualization used for Data exploration in Data Science?

  • A. Newton AI
  • B. Feature Distribution by Class
  • C. 2D-Density Plots
  • D. Sand Visualization
  • E. Heat Maps

Answer: C,D,E

Explanation:
Explanation
Type of visualization used for exploration:
Correlation heatmap
Class distributions by feature
Two-Dimensional density plots.
All the visualizations are interactive, as is standard for Plotly.
For More details, please refer the below link:
https://towardsdatascience.com/data-exploration-understanding-and-visualization-72657f5eac41


NEW QUESTION # 28
Which ones are the key actions in the data collection phase of Machine learning included?

  • A. Label
  • B. Probability
  • C. Ingest and Aggregate
  • D. Measure

Answer: A,C

Explanation:
Explanation
The key actions in the data collection phase include:
Label: Labeled data is the raw data that was processed by adding one or more meaningful tags so that a model can learn from it. It will take some work to label it if such information is missing (manually or automatically).
Ingest and Aggregate: Incorporating and combining data from many data sources is part of data collection in AI.
Data collection
Collecting data for training the ML model is the basic step in the machine learning pipeline. The predictions made by ML systems can only be as good as the data on which they have been trained. Following are some of the problems that can arise in data collection:
Inaccurate data. The collected data could be unrelated to the problem statement.
Missing data. Sub-data could be missing. That could take the form of empty values in columns or missing images for some class of prediction.
Data imbalance. Some classes or categories in the data may have a disproportionately high or low number of corresponding samples. As a result, they risk being under-represented in the model.
Data bias. Depending on how the data, subjects and labels themselves are chosen, the model could propagate inherent biases on gender, politics, age or region, for example. Data bias is difficult to detect and remove.
Several techniques can be applied to address those problems:
Pre-cleaned, freely available datasets. If the problem statement (for example, image classification, object recognition) aligns with a clean, pre-existing, properly formulated dataset, then take ad-vantage of existing, open-source expertise.
Web crawling and scraping. Automated tools, bots and headless browsers can crawl and scrape websites for data.
Private data. ML engineers can create their own data. This is helpful when the amount of data required to train the model is small and the problem statement is too specific to generalize over an open-source dataset.
Custom data. Agencies can create or crowdsource the data for a fee.


NEW QUESTION # 29
Which of the following cross validation versions is suitable quicker cross-validation for very large datasets with hundreds of thousands of samples?

  • A. Holdout method
  • B. All of the above
  • C. Leave-one-out cross-validation
  • D. k-fold cross-validation

Answer: A

Explanation:
Explanation
Holdout cross-validation method is suitable for very large dataset because it is the simplest and quicker to compute version of cross-validation.
Holdout method
In this method, the dataset is divided into two sets namely the training and the test set with the basic property that the training set is bigger than the test set. Later, the model is trained on the training dataset and evaluated using the test dataset.


NEW QUESTION # 30
What Can Snowflake Data Scientist do in the Snowflake Marketplace as Consumer?

  • A. Discover and test third-party data sources.
  • B. Receive frictionless access to raw data products from vendors.
  • C. Use the business intelligence (BI)/ML/Deep learning tools of her choice.
  • D. Combine new datasets with your existing data in Snowflake to derive new business in-sights.

Answer: A,B,C,D

Explanation:
Explanation
As a consumer, you can do the following:
Discover and test third-party data sources.
Receive frictionless access to raw data products from vendors.
Combine new datasets with your existing data in Snowflake to derive new business insights.
Have datasets available instantly and updated continually for users.
Eliminate the costs of building and maintaining various APIs and data pipelines to load and up-date data.
Use the business intelligence (BI) tools of your choice.


NEW QUESTION # 31
Which command manually triggers a single run of a scheduled task (either a standalone task or the root task in a DAG) independent of the schedule defined for the task?

  • A. RUN TASK
  • B. EXECUTE TASK
  • C. RUN ROOT TASK
  • D. CALL TASK

Answer: B

Explanation:
Explanation
The EXECUTE TASK command manually triggers a single run of a scheduled task (either a standalone task or the root task in a DAG) independent of the schedule defined for the task. A successful run of a roottask triggers a cascading run of child tasks in the DAG as their precedent task completes, as though the root task had run on its defined schedule.
This SQL command is useful for testing new or modified standalone tasks and DAGs before you enable them to execute SQL code in production.
Call this SQL command directly in scripts or in stored procedures. In addition, this command sup-ports integrating tasks in external data pipelines. Any third-party services that can authenticate into your Snowflake account and authorize SQL actions can execute the EXECUTE TASK command to run tasks.


NEW QUESTION # 32
A Data Scientist as data providers require to allow consumers to access all databases and database objects in a share by granting a single privilege on shared databases. Which one is incorrect SnowSQL command used by her while doing this task?
Assuming:
A database named product_db exists with a schema named product_agg and a table named Item_agg.
The database, schema, and table will be shared with two accounts named xy12345 and yz23456.
1.USE ROLE accountadmin;
2.CREATE DIRECT SHARE product_s;
3.GRANT USAGE ON DATABASE product_db TO SHARE product_s;
4.GRANT USAGE ON SCHEMA product_db. product_agg TO SHARE product_s;
5.GRANT SELECT ON TABLE sales_db. product_agg.Item_agg TO SHARE product_s;
6.SHOW GRANTS TO SHARE product_s;
7.ALTER SHARE product_s ADD ACCOUNTS=xy12345, yz23456;
8.SHOW GRANTS OF SHARE product_s;

  • A. GRANT SELECT ON TABLE sales_db. product_agg.Item_agg TO SHARE product_s;
  • B. CREATE DIRECT SHARE product_s;
  • C. ALTER SHARE product_s ADD ACCOUNTS=xy12345, yz23456;
  • D. GRANT USAGE ON DATABASE product_db TO SHARE product_s;

Answer: A

Explanation:
Explanation
CREATE SHARE product_s is the correct Snowsql command to create Share object.
Rest are correct ones.
https://docs.snowflake.com/en/user-guide/data-sharing-provider#creating-a-share-using-sql


NEW QUESTION # 33
Secure Data Sharing do not let you share which of the following selected objects in a database in your account with other Snowflake accounts?

  • A. Tables
  • B. External tables
  • C. Sequences
  • D. Secure UDFs

Answer: C

Explanation:
Explanation
Secure Data Sharing lets you share selected objects in a database in your account with other Snow-flake accounts. You can share the following Snowflake database objects:
Tables
External tables
Secure views
Secure materialized views
Secure UDFs
Snowflake enables the sharing of databases through shares, which are created by data providers and
"imported" by data consumers.


NEW QUESTION # 34
......

Get Perfect Results with Premium DSA-C02 Dumps Updated 67 Questions: https://www.itexamreview.com/DSA-C02-exam-dumps.html

Free DSA-C02 Exam Study Guide for the NEW Dumps Test Engine: https://drive.google.com/open?id=161nDr2sG5TZlr5vh7E9pp2Akq9UEJQi_